Expectation-Maximization (EM) algorithm

1-dimensional simple case

What's the objective: given two populations of numbers drawn from 2 gaussian distributions, estimate the 2 gaussian parameters (means and the variances).

Let's see how this works.

Suppose our experimental setup registers numbers from 2 gaussian sources, let's call them red and blue. Then we have then 2 sets of numbers distributed according to two 1D gaussian distributions (in the exercise case we have generated them synthetically).

If we knew which numbers belongs to which distribution, we could easily estimate the means and the variances of the two distributions by calculating:

for the set (x_1, x_2, ..., x_nb) of the blue distribution and similarly for the red distribution. Then we are happy and completed the task.

But let's say we don't know which numbers belong to which distribution. Then the problem looks challenging. However, let's assume we have an initial guess of the distribution parameters (means and variances). Then for each point x_i in our dataset we could calculate the probability b_i of that point belonging to the blue (Px_i|b) distribution, by using the Bayes Theorem:

and similarly for red. We have also assumed some value for the prior P(b) and P(r). As mentioned above, the distribution parameters (means and variances) we use in the likelihood P(x_i|b) and P(x_i|r) are our initial guesses. Each point x_i is therefore associated with a probability b_i and r_i to belong to the blue distribution and to the red distribution. This is a 'soft clustering' technique since each point it's given some probability to belong to both distributions (unlike K-mean clustering, which is a 'hard clustering' technique).

The next step is to re-estimate the means (and the variances) of the two distributions given the b_i's and the r_i's we have just calculated. We use them as weights in a weighted mean. For example, the new estimates of the blue parameters are:

and similarly for the red parameters. The procedure can be iterated again by calculating at each step the new b_i's and r_i's using the parameters of the previous iteration and estimating the new parameters. Eventually, after some iterations the new parameters will converge to some final values. These will be our final estimates of the distribution paramters.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
EM_algorithm.py		EM_algorithm.py
EM_gaussians1D_20iter.png		EM_gaussians1D_20iter.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Expectation-Maximization (EM) algorithm

1-dimensional simple case

About

Uh oh!

Releases

Packages

Languages

nmaffe/expectation-maximization-EM-algorithm

Folders and files

Latest commit

History

Repository files navigation

Expectation-Maximization (EM) algorithm

1-dimensional simple case

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages