Chapter 14 Expectation Maximization (EM Algorithm) Flashcards

Question 1

Q

Why is expectation maximization used? P 128

Answer

A

Maximum likelihood becomes intractable if there are variables that interact with those in the dataset but were hidden or not observed, so-called latent variables. The expectation-maximization algorithm is an approach for performing maximum likelihood estimation in the presence of latent variables.

In the presense of latent variables, Maximum likelihood becomes problematic, so expectation maximization is used to fix it.

Question 2

Q

What is a latent variable? Give an example External

Answer

A

In statistics, latent variables are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly.

Question 3

Q

Expectation maximization is an effective and general approach and is most commonly used for ____ , such as clustering algorithms like ____ Model. P 128

Answer

A

Density estimation with missing data (latent variables), the Gaussian Mixture

Question 4

Q

A limitation of maximum likelihood estimation is that it assumes that the dataset is complete, or fully observed. True/False P 129

Question 5

Q

Many real-world problems have hidden variables (sometimes called latent variables), which are not observable in the data that are available for learning. True/False P 129

Question 6

Q

if we have ____ and/or ____ , then computing the [maximum likelihood] estimate becomes hard. P 129

Answer

A

missing data, latent variables

Question 7

Q

What are the two iterative steps of the EM algorithm? P 130

Answer

A

E-Step. Estimate the missing variables in the dataset.
M-Step. Maximize the parameters of the model in the presence of the data

The EM algorithm is an iterative approach that cycles between two modes.

The first mode attempts to estimate the missing or latent variables, called the estimation-step or E-step.
The second mode attempts to optimize the parameters of the model to best explain the data, called the maximization-step or M-step.

Question 8

Q

The EM algorithm can be applied quite widely, although is perhaps most well known in machine learning for use in unsupervised learning problems, such as ____ and ____ . P 130

Answer

A

density estimation (with latent variables), clustering

Question 9

Q

What’s the Gaussian mixture model? P 130

Answer

A

The Gaussian Mixture Model, or GMM for short, is a mixture model that uses a combination of Gaussian (Normal) probability distributions and requires the estimation of the mean and standard deviation parameters for each

Question 10

Q

How does Gaussian Mixture Model (GMM) work? P 130

worked example P 130

Answer

A

Consider the case where a dataset is comprised of many points that happen to be generated by two different processes.
The points for each process have a Gaussian probability distribution,
But the data is combined and the distributions are similar enough that it is not obvious to which distribution a given point may belong.
The processes used to generate the data point represents a latent variable, e.g. process 0 and process 1. It influences the data but is not observable.
As such, the EM algorithm is an appropriate approach to use to estimate the parameters of the distributions.

to measure latent variables in research we use the observed variables and then mathematically infer the unseen variables.

Question 11

Q

We know the GaussianMixtureModel, uses Expectation Maximization to detect different underlying Gaussian distributions in a dataset, In Sklearn, we need to provide “n_components” for the GMM class, what should we do if we don’t know the number of distributions? P 132

Answer

A

If the number of processes was not known, a range of different numbers of components could be tested and the model with the best fit could be chosen

Question 12

Q

What are some of the metrics to evaluate GMM’s performance with? P 132

Answer

A

scores such as Akaike or Bayesian Information Criterion (AIC or BIC)

Question 13

Q

There are also many ways we can configure the model to incorporate other information we may know about the data, give an example of this in Sklearn’s GMM class. P 132

Answer

A

An example would be how to estimate initial values for the distributions. We can randomly guess the initial parameters, by setting the init_params argument to ‘random’. (Default is Kmeans)
GM doc
## Footnote

Also, once the model is fit, we can access the learned parameters via arguments on the model, such as the means, covariances, mixing weights, and more. More usefully, we can use the fit model to estimate the latent parameters for existing and new data points.(predict)

Question 14

Q

When using EM in GMM, it is expected that the points between the peaks of the distribution will be assigned with great precision. True/False P 133

Answer

A

False, EM is a generally challenging problem and it is expected that the points between the peaks of the distribution will remain ambiguous and assigned to one process or another holistically.

Question 15

Q

Summary

Maximum likelihood estimation is challenging on data in the presence of ____
Expectation maximization provides an iterative solution to ____
Gaussian mixture models are an approach to density estimation where the parameters of the distributions are fit using the ____ algorithm.

Answer

A

latent variables.
maximum likelihood estimation with latent variables.
expectation-maximization.