Chapter 11 Maximum Likelihood Estimation Flashcards
There are many techniques for solving density estimation, although a common framework used throughout the field of machine learning is …. P 106
Maximum likelihood estimation
How does Maximum Likelihood Estimation work? P 106
1- It defines a likelihood function
2- This function calculates: the conditional probability of observing the data sample GIVEN a probability distribution and its parameters
A probability density function (pdf) is a non-negative function that integrates to 1.
The likelihood is defined as the joint probability of the observed data as a function of the pdf parameter.
For what purpose is Maximum Likelihood Estimation used?(Generally and in Machine Learning-example models) P 106
This approach can be used to search a space of possible distributions and parameters. This flexible probabilistic framework also provides the foundation for many machine learning algorithms, including important methods such as linear regression and logistic regression for predicting numeric values and class labels respectively, but also more generally for deep learning artificial neural networks
Density estimation involves selecting a ____ and the parameters of that distribution that best explain the ____ of the observed data (X).
P 107
probability distribution function, joint probability distribution
For what kind of sample data is non-parametric (kernel) density estimation more erroneous? What are 2 common techniques for solving this problem? P 107
If the sample (X) drawn from the population is small and has noise, any evaluation of an estimated probability density function and its parameters will have some errors.
There are many techniques for solving this problem, although two common approaches are:
Maximum a Posteriori (MAP), a Bayesian method.
Maximum Likelihood Estimation (MLE), frequentist method.
What’s the difference between Maximum a Posteriori (MAP) and Maximum Likelihood Estimation (MLE) for density estimation? P 107
The main difference is that MLE assumes that all solutions are equally likely beforehand, whereas MAP allows prior information about the form of the solution to be harnessed.
When is the likelihood maximized? External
When the parameters of the distribution are set in a way that it fits the dense and sparse parts of the data the best. This way, the denser parts have higher density value and the joint probability is at its highest and is maximized. So maximized likelihood equals good fit
What is the definition of likelihood function? External
The likelihood function is defined as the joint probability of the observed data treated as a function of the parameter theta, theta is the parameter of a probability density function we are testing its fit onto the data.
How is the Negative Log-Likelihood (NLL) function calculated? what does minimum of NLL mean? P 108
Negative Logarithm of the likelihood function, which is stated as below:
– sum( log P(xi ; θ))
min (NLL) means that for a certain value of parameter θ, we have found the best fit of probability density function for the sample data.
When do we use Maximum Likelihood Estimation and not Kernel Density Estimation? External
KDE is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample, when the sample is small and noisy, this method is not useful and we use MLE
What’s the difference between parametric probability density estimation and non-parametric probability density estimation? External
Parametric probability density estimation involves selecting a common distribution and estimating the parameters for the density function from a data sample. Non-parametric probability density estimation involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation.
Maximum Likelihood Estimation provides the basis of estimating the probability density function of a dataset, this can be used in unsupervised machine learning algorithms, such as:…. P 109
Clustering algorithms
The Maximum Likelihood Estimation framework is also a useful tool for supervised machine learning. How is MLE defined in case of supervised machine learning algorithms? P 109
We can state it as the conditional probability of the output y given the input (X) given the modeling hypothesis (h).
Max sum(log P(yi |xi ; h))
The maximum likelihood estimator can readily be generalized to the case where our goal is to estimate a conditional probability P(y|x; θ) in order to predict y given x.
This means that the same Maximum Likelihood Estimation framework that is generally used for density estimation can be used to find a supervised learning model and parameters.
An important benefit of the maximum likelihood estimator in machine learning is that as the size of the dataset increases, the quality of the estimator continues to improve. True/False P 110
True