Statistical concepts Flashcards

1
Q

What is the goal of Maximum Likelihood Estimation (MLE)?

A

The goal of MLE is to find the parameters of a probability distribution that make the observed data most probable. It ‘fits’ a distribution to the data by maximizing the likelihood function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the likelihood function?

A

The likelihood function measures how probable the observed data is given a set of model parameters. It is denoted as L(θ) = P(X | θ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we use the log-likelihood function instead of the likelihood function?

A

Because it simplifies calculations by turning products (usually probabilities come in a product) into sums and avoids numerical underflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we find the MLE of a model with parameters theta?

A

To find the MLE, we take the derivative of the log-likelihood function, set it to zero, and solve for the parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean to fit a distribution to data using MLE?

A

It means estimating the best parameters for a probability distribution that most likely generated the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of a distribution where MLE is used?

A

MLE is commonly used to estimate parameters of Normal distribution. But also Poisson, Exponential, and other probability distributions are being used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the MLE estimates for a given observed data assuming normal distribution?

A

For a normal distribution, MLE estimates the mean as the sample average and the variance as the sample variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some applications of MLE?

A

MLE is used in machine learning, statistical inference, financial modeling, and natural language processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What assumption does MLE rely on?

A

MLE assumes that the observed data follows a known probability distribution with unknown parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is MLE related to Bayesian estimation?

A

MLE finds the most likely parameters without prior knowledge, while Bayesian estimation incorporates prior beliefs through Bayes’ theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the goal of Maximum Likelihood Estimation (MLE) in classification?

A

MLE aims to maximize the likelihood of the correct class label given the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the loss function derived from the negative log-likelihood in a classification problem?

A

Cross-entropy loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the cross-entropy loss measure?

A

It measures how different the predicted probability distribution is from the true one, penalizing incorrect predictions more strongly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is softmax used in classification neural networks?

A

Softmax converts raw scores (logits) into a valid probability distribution, making them interpretable for categorical classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is softmax mathematically defined?

A

Softmax for class c: p_c = exp(z_c) / sum(exp(z_j)), ensuring all outputs sum to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the relationship between softmax and cross-entropy?

A

Softmax outputs probabilities, and cross-entropy measures the difference between these and the true labels, forming the standard classification loss.

17
Q

What assumption about data leads to using softmax + cross-entropy?

A

We assume a categorical distribution over classes, making softmax + cross-entropy the natural choice under MLE.

18
Q

Why does softmax ensure a valid probability distribution?

A

Softmax ensures all outputs are positive and sum to 1, making them interpretable as probabilities.

19
Q

Break down the steps for MLE in a classification problem.

A

1 Assume a categorical distribution: given data x, the true label follows a one-hot encoded probability distribution (Dirac delta), meaning it’s 1 for the correct class and 0 for the others.
2 Using MLE we want to maximise the probability for all the data points.
3 We create a model p(y∣x,θ) (using an NN) which gives a value for each class (logits).
4 We use Softmax so the output can be interpreted as a probability function for each class.
5 To maximise the log-likelihood, we choose Cross entropy loss to make the predicted distribution as close as possible to our assumption (Dirac function).