Chapter 14 Expectation Maximization (EM Algorithm) Flashcards

1
Q

Why is expectation maximization used? P 128

A

Maximum likelihood becomes intractable if there are variables that interact with those in the dataset but were hidden or not observed, so-called latent variables. The expectation-maximization algorithm is an approach for performing maximum likelihood estimation in the presence of latent variables.

In the presense of latent variables, Maximum likelihood becomes problematic, so expectation maximization is used to fix it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a latent variable? Give an example External

A

In statistics, latent variables are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Expectation maximization is an effective and general approach and is most commonly used for ____ , such as clustering algorithms like ____ Model. P 128

A

Density estimation with missing data (latent variables), the Gaussian Mixture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A limitation of maximum likelihood estimation is that it assumes that the dataset is complete, or fully observed. True/False P 129

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Many real-world problems have hidden variables (sometimes called latent variables), which are not observable in the data that are available for learning. True/False P 129

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if we have ____ and/or ____ , then computing the [maximum likelihood] estimate becomes hard. P 129

A

missing data, latent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two iterative steps of the EM algorithm? P 130

A

ˆ E-Step. Estimate the missing variables in the dataset.
ˆ M-Step. Maximize the parameters of the model in the presence of the data

The EM algorithm is an iterative approach that cycles between two modes.

  • The first mode attempts to estimate the missing or latent variables, called the estimation-step or E-step.
  • The second mode attempts to optimize the parameters of the model to best explain the data, called the maximization-step or M-step.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The EM algorithm can be applied quite widely, although is perhaps most well known in machine learning for use in unsupervised learning problems, such as ____ and ____ . P 130

A

density estimation (with latent variables), clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the Gaussian mixture model? P 130

A

The Gaussian Mixture Model, or GMM for short, is a mixture model that uses a combination of Gaussian (Normal) probability distributions and requires the estimation of the mean and standard deviation parameters for each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Gaussian Mixture Model (GMM) work? P 130

worked example P 130

A
  1. Consider the case where a dataset is comprised of many points that happen to be generated by two different processes.
  2. The points for each process have a Gaussian probability distribution,
  3. But the data is combined and the distributions are similar enough that it is not obvious to which distribution a given point may belong.
  4. The processes used to generate the data point represents a latent variable, e.g. process 0 and process 1. It influences the data but is not observable.
  5. As such, the EM algorithm is an appropriate approach to use to estimate the parameters of the distributions.

to measure latent variables in research we use the observed variables and then mathematically infer the unseen variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

We know the GaussianMixtureModel, uses Expectation Maximization to detect different underlying Gaussian distributions in a dataset, In Sklearn, we need to provide “n_components” for the GMM class, what should we do if we don’t know the number of distributions? P 132

A

If the number of processes was not known, a range of different numbers of components could be tested and the model with the best fit could be chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some of the metrics to evaluate GMM’s performance with? P 132

A

scores such as Akaike or Bayesian Information Criterion (AIC or BIC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

There are also many ways we can configure the model to incorporate other information we may know about the data, give an example of this in Sklearn’s GMM class. P 132

A

An example would be how to estimate initial values for the distributions. We can randomly guess the initial parameters, by setting the init_params argument to ‘random’. (Default is Kmeans)
GM doc
## Footnote

Also, once the model is fit, we can access the learned parameters via arguments on the model, such as the means, covariances, mixing weights, and more. More usefully, we can use the fit model to estimate the latent parameters for existing and new data points.(predict)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When using EM in GMM, it is expected that the points between the peaks of the distribution will be assigned with great precision. True/False P 133

A

False, EM is a generally challenging problem and it is expected that the points between the peaks of the distribution will remain ambiguous and assigned to one process or another holistically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Summary

  • Maximum likelihood estimation is challenging on data in the presence of ____
  • Expectation maximization provides an iterative solution to ____
  • Gaussian mixture models are an approach to density estimation where the parameters of the distributions are fit using the ____ algorithm.
A

latent variables.
maximum likelihood estimation with latent variables.
expectation-maximization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly