Statistical concepts Flashcards

Question 1

Q

Statistics - What is the goal of Maximum Likelihood Estimation (MLE)?

Answer

A

The goal of MLE is to find the parameters of a probability distribution that make the observed data most probable. It ‘fits’ a distribution to the data by maximizing the likelihood function.

Question 2

Q

Statistics - What is the likelihood function?

Answer

A

The likelihood function measures how probable the observed data is given a set of model parameters. It is denoted as L(θ) = P(X | θ).

Question 3

Q

Statistics - Why do we use the log-likelihood function instead of the likelihood function?

Answer

A

Because it simplifies calculations by turning products (usually probabilities come in a product) into sums and avoids numerical underflow.

Question 4

Q

Statistics - How do we find the MLE of a model with parameters theta?

Answer

A

To find the MLE, we take the derivative of the log-likelihood function, set it to zero, and solve for the parameters.

Question 5

Q

Statistics - What does it mean to fit a distribution to data using MLE?

Answer

A

It means estimating the best parameters for a probability distribution that most likely generated the observed data.

Question 6

Q

Statistics - What is an example of a distribution where MLE is used?

Answer

A

MLE is commonly used to estimate parameters of Normal distribution. But also Poisson, Exponential, and other probability distributions are being used.

Question 7

Q

Statistics - What are the MLE estimates for a given observed data assuming normal distribution?

Answer

A

For a normal distribution, MLE estimates the mean as the sample average and the variance as the sample variance.

Question 8

Q

Statistics - What are some applications of MLE?

Answer

A

MLE is used in machine learning, statistical inference, financial modeling, and natural language processing.

Question 9

Q

Statistics - What assumption does MLE rely on?

Answer

A

MLE assumes that the observed data follows a known probability distribution with unknown parameters.

Question 10

Q

Statistics - How is MLE related to Bayesian estimation?

Answer

A

MLE finds the most likely parameters without prior knowledge, while Bayesian estimation incorporates prior beliefs through Bayes’ theorem.

Question 11

Q

Statistics - What is the goal of Maximum Likelihood Estimation (MLE) in classification?

Answer

A

MLE aims to maximize the likelihood of the correct class label given the input data.

Question 12

Q

Statistics - What is the loss function derived from the negative log-likelihood in a classification problem?

Answer

A

Cross-entropy loss.

Question 13

Q

Statistics - What does the cross-entropy loss measure?

Answer

A

It measures how different the predicted probability distribution is from the true one.

Question 14

Q

Statistics - Why is softmax used in classification neural networks?

Answer

A

Softmax converts raw scores (logits) into a valid probability distribution, making them interpretable for categorical classification.

Question 15

Q

Statistics - How is softmax mathematically defined?

Answer

A

Softmax for class c: p_c = exp(z_c) / sum(exp(z_j)), ensuring all outputs sum to 1.

Question 16

Q

Statistics - What is the relationship between softmax and cross-entropy?

Answer

Study These Flashcards

A

Softmax outputs probabilities, and cross-entropy measures the difference between these and the true labels, forming the standard classification loss.

Question 17

Q

Statistics - What assumption about data leads to using softmax + cross-entropy?

Answer

Study These Flashcards

A

We assume a categorical distribution over classes, making softmax + cross-entropy the natural choice under MLE.

Question 18

Q

Statistics - Why does softmax ensure a valid probability distribution?

Answer

Study These Flashcards

A

Softmax ensures all outputs are positive and sum to 1.

Question 19

Q

Statistics - Break down the steps for MLE in a classification problem.

Answer

Study These Flashcards

A

1 Assume a categorical distribution: given data x, the true label follows a one-hot encoded probability distribution (Dirac delta), meaning it’s 1 for the correct class and 0 for the others.
2 Using MLE we want to maximise the probability for all the data points.
3 We create a model p(y∣x,θ) (using an NN) which gives a value for each class (logits).
4 We use Softmax so the output can be interpreted as a probability function for each class.
5 To maximise the log-likelihood, we choose Cross entropy loss to make the predicted distribution as close as possible to our assumption (Dirac function).

Question 20

Q

What is stochastic?

Answer

Study These Flashcards

A

It is the property of being well-described by a random probability distribution.

Statistical concepts Flashcards

(20 cards)