Chapter 11 Maximum Likelihood Estimation Flashcards

1
Q

There are many techniques for solving density estimation, although a common framework used throughout the field of machine learning is …. P 106

A

Maximum likelihood estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Maximum Likelihood Estimation work? P 106

A

1- It defines a likelihood function
2- This function calculates: the conditional probability of observing the data sample GIVEN a probability distribution and its parameters

A probability density function (pdf) is a non-negative function that integrates to 1.

The likelihood is defined as the joint probability of the observed data as a function of the pdf parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For what purpose is Maximum Likelihood Estimation used?(Generally and in Machine Learning-example models) P 106

A

This approach can be used to search a space of possible distributions and parameters. This flexible probabilistic framework also provides the foundation for many machine learning algorithms, including important methods such as linear regression and logistic regression for predicting numeric values and class labels respectively, but also more generally for deep learning artificial neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Density estimation involves selecting a ____ and the parameters of that distribution that best explain the ____ of the observed data (X).

P 107

A

probability distribution function, joint probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For what kind of sample data is non-parametric (kernel) density estimation more erroneous? What are 2 common techniques for solving this problem? P 107

A

If the sample (X) drawn from the population is small and has noise, any evaluation of an estimated probability density function and its parameters will have some errors.
There are many techniques for solving this problem, although two common approaches are:
ˆ Maximum a Posteriori (MAP), a Bayesian method.
ˆ Maximum Likelihood Estimation (MLE), frequentist method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the difference between Maximum a Posteriori (MAP) and Maximum Likelihood Estimation (MLE) for density estimation? P 107

A

The main difference is that MLE assumes that all solutions are equally likely beforehand, whereas MAP allows prior information about the form of the solution to be harnessed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is the likelihood maximized? External

A

When the parameters of the distribution are set in a way that it fits the dense and sparse parts of the data the best. This way, the denser parts have higher density value and the joint probability is at its highest and is maximized. So maximized likelihood equals good fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the definition of likelihood function? External

A

The likelihood function is defined as the joint probability of the observed data treated as a function of the parameter theta, theta is the parameter of a probability density function we are testing its fit onto the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the Negative Log-Likelihood (NLL) function calculated? what does minimum of NLL mean? P 108

A

Negative Logarithm of the likelihood function, which is stated as below:
– sum( log P(xi ; θ))
min (NLL) means that for a certain value of parameter θ, we have found the best fit of probability density function for the sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do we use Maximum Likelihood Estimation and not Kernel Density Estimation? External

A

KDE is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample, when the sample is small and noisy, this method is not useful and we use MLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the difference between parametric probability density estimation and non-parametric probability density estimation? External

A

Parametric probability density estimation involves selecting a common distribution and estimating the parameters for the density function from a data sample. Non-parametric probability density estimation involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Maximum Likelihood Estimation provides the basis of estimating the probability density function of a dataset, this can be used in unsupervised machine learning algorithms, such as:…. P 109

A

Clustering algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The Maximum Likelihood Estimation framework is also a useful tool for supervised machine learning. How is MLE defined in case of supervised machine learning algorithms? P 109

A

We can state it as the conditional probability of the output y given the input (X) given the modeling hypothesis (h).
Max sum(log P(yi |xi ; h))
The maximum likelihood estimator can readily be generalized to the case where our goal is to estimate a conditional probability P(y|x; θ) in order to predict y given x.
This means that the same Maximum Likelihood Estimation framework that is generally used for density estimation can be used to find a supervised learning model and parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An important benefit of the maximum likelihood estimator in machine learning is that as the size of the dataset increases, the quality of the estimator continues to improve. True/False P 110

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly