Chapter 11 Maximum Likelihood Estimation Flashcards

Question 1

Q

There are many techniques for solving density estimation, although a common framework used throughout the field of machine learning is …. P 106

Answer

A

Maximum likelihood estimation

Question 2

Q

How does Maximum Likelihood Estimation work? P 106

Answer

A

1- It defines a likelihood function
2- This function calculates: the conditional probability of observing the data sample GIVEN a probability distribution and its parameters

A probability density function (pdf) is a non-negative function that integrates to 1.

The likelihood is defined as the joint probability of the observed data as a function of the pdf parameter.

Question 3

Q

For what purpose is Maximum Likelihood Estimation used?(Generally and in Machine Learning-example models) P 106

Answer

A

This approach can be used to search a space of possible distributions and parameters. This flexible probabilistic framework also provides the foundation for many machine learning algorithms, including important methods such as linear regression and logistic regression for predicting numeric values and class labels respectively, but also more generally for deep learning artificial neural networks

Question 4

Q

Density estimation involves selecting a ____ and the parameters of that distribution that best explain the ____ of the observed data (X).

P 107

Answer

A

probability distribution function, joint probability distribution

Question 5

Q

For what kind of sample data is non-parametric (kernel) density estimation more erroneous? What are 2 common techniques for solving this problem? P 107

Answer

A

If the sample (X) drawn from the population is small and has noise, any evaluation of an estimated probability density function and its parameters will have some errors.
There are many techniques for solving this problem, although two common approaches are:
Maximum a Posteriori (MAP), a Bayesian method.
Maximum Likelihood Estimation (MLE), frequentist method.

Question 6

Q

What’s the difference between Maximum a Posteriori (MAP) and Maximum Likelihood Estimation (MLE) for density estimation? P 107

Answer

A

The main difference is that MLE assumes that all solutions are equally likely beforehand, whereas MAP allows prior information about the form of the solution to be harnessed.

Question 7

Q

When is the likelihood maximized? External

Answer

A

When the parameters of the distribution are set in a way that it fits the dense and sparse parts of the data the best. This way, the denser parts have higher density value and the joint probability is at its highest and is maximized. So maximized likelihood equals good fit

Question 8

Q

What is the definition of likelihood function? External

Answer

A

The likelihood function is defined as the joint probability of the observed data treated as a function of the parameter theta, theta is the parameter of a probability density function we are testing its fit onto the data.

Question 9

Q

How is the Negative Log-Likelihood (NLL) function calculated? what does minimum of NLL mean? P 108

Answer

A

Negative Logarithm of the likelihood function, which is stated as below:
– sum( log P(xi ; θ))
min (NLL) means that for a certain value of parameter θ, we have found the best fit of probability density function for the sample data.

Question 10

Q

When do we use Maximum Likelihood Estimation and not Kernel Density Estimation? External

Answer

A

KDE is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample, when the sample is small and noisy, this method is not useful and we use MLE

Question 11

Q

What’s the difference between parametric probability density estimation and non-parametric probability density estimation? External

Answer

A

Parametric probability density estimation involves selecting a common distribution and estimating the parameters for the density function from a data sample. Non-parametric probability density estimation involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation.

Question 12

Q

Maximum Likelihood Estimation provides the basis of estimating the probability density function of a dataset, this can be used in unsupervised machine learning algorithms, such as:…. P 109

Answer

A

Clustering algorithms

Question 13

Q

The Maximum Likelihood Estimation framework is also a useful tool for supervised machine learning. How is MLE defined in case of supervised machine learning algorithms? P 109

Answer

A

We can state it as the conditional probability of the output y given the input (X) given the modeling hypothesis (h).
Max sum(log P(yi |xi ; h))
The maximum likelihood estimator can readily be generalized to the case where our goal is to estimate a conditional probability P(y|x; θ) in order to predict y given x.
This means that the same Maximum Likelihood Estimation framework that is generally used for density estimation can be used to find a supervised learning model and parameters.

Question 14

Q

An important benefit of the maximum likelihood estimator in machine learning is that as the size of the dataset increases, the quality of the estimator continues to improve. True/False P 110

Chapter 11 Maximum Likelihood Estimation Flashcards

(14 cards)