Chapter 14 Expectation Maximization (EM Algorithm) Flashcards
Why is expectation maximization used? P 128
Maximum likelihood becomes intractable if there are variables that interact with those in the dataset but were hidden or not observed, so-called latent variables. The expectation-maximization algorithm is an approach for performing maximum likelihood estimation in the presence of latent variables.
In the presense of latent variables, Maximum likelihood becomes problematic, so expectation maximization is used to fix it.
What is a latent variable? Give an example External
In statistics, latent variables are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly.
Expectation maximization is an effective and general approach and is most commonly used for ____ , such as clustering algorithms like ____ Model. P 128
Density estimation with missing data (latent variables), the Gaussian Mixture
A limitation of maximum likelihood estimation is that it assumes that the dataset is complete, or fully observed. True/False P 129
True
Many real-world problems have hidden variables (sometimes called latent variables), which are not observable in the data that are available for learning. True/False P 129
True
if we have ____ and/or ____ , then computing the [maximum likelihood] estimate becomes hard. P 129
missing data, latent variables
What are the two iterative steps of the EM algorithm? P 130
E-Step. Estimate the missing variables in the dataset.
M-Step. Maximize the parameters of the model in the presence of the data
The EM algorithm is an iterative approach that cycles between two modes.
- The first mode attempts to estimate the missing or latent variables, called the estimation-step or E-step.
- The second mode attempts to optimize the parameters of the model to best explain the data, called the maximization-step or M-step.
The EM algorithm can be applied quite widely, although is perhaps most well known in machine learning for use in unsupervised learning problems, such as ____ and ____ . P 130
density estimation (with latent variables), clustering
What’s the Gaussian mixture model? P 130
The Gaussian Mixture Model, or GMM for short, is a mixture model that uses a combination of Gaussian (Normal) probability distributions and requires the estimation of the mean and standard deviation parameters for each
How does Gaussian Mixture Model (GMM) work? P 130
worked example P 130
- Consider the case where a dataset is comprised of many points that happen to be generated by two different processes.
- The points for each process have a Gaussian probability distribution,
- But the data is combined and the distributions are similar enough that it is not obvious to which distribution a given point may belong.
- The processes used to generate the data point represents a latent variable, e.g. process 0 and process 1. It influences the data but is not observable.
- As such, the EM algorithm is an appropriate approach to use to estimate the parameters of the distributions.
to measure latent variables in research we use the observed variables and then mathematically infer the unseen variables.
We know the GaussianMixtureModel, uses Expectation Maximization to detect different underlying Gaussian distributions in a dataset, In Sklearn, we need to provide “n_components” for the GMM class, what should we do if we don’t know the number of distributions? P 132
If the number of processes was not known, a range of different numbers of components could be tested and the model with the best fit could be chosen
What are some of the metrics to evaluate GMM’s performance with? P 132
scores such as Akaike or Bayesian Information Criterion (AIC or BIC)
There are also many ways we can configure the model to incorporate other information we may know about the data, give an example of this in Sklearn’s GMM class. P 132
An example would be how to estimate initial values for the distributions. We can randomly guess the initial parameters, by setting the init_params argument to ‘random’. (Default is Kmeans)
GM doc
## Footnote
Also, once the model is fit, we can access the learned parameters via arguments on the model, such as the means, covariances, mixing weights, and more. More usefully, we can use the fit model to estimate the latent parameters for existing and new data points.(predict)
When using EM in GMM, it is expected that the points between the peaks of the distribution will be assigned with great precision. True/False P 133
False, EM is a generally challenging problem and it is expected that the points between the peaks of the distribution will remain ambiguous and assigned to one process or another holistically.
Summary
- Maximum likelihood estimation is challenging on data in the presence of ____
- Expectation maximization provides an iterative solution to ____
- Gaussian mixture models are an approach to density estimation where the parameters of the distributions are fit using the ____ algorithm.
latent variables.
maximum likelihood estimation with latent variables.
expectation-maximization.