Maximum Likelihood Estimation Flashcards

Question 1

Q

Which steps does the optimization process follow?

Answer

A

Take the first derivative with respect to the parameter(s) = theta
First order condition: Set the derivative equal to 0 and solve for theta (we know that at the maximum, the slope of the tangent line is 0)
Second order condition: Ensure that the second derivative at the estimator is negative.

Question 2

Q

What is the kernel of the Log-Likelihood?

Answer

A

it strips the log-likelihood of those parts that exclude the parameters of interest.

Question 3

Q

What is the gradient?

Answer

A

It consists of the first (partial) derivative(s) of the log-likelihood with respect to the parameters.

It must be 0.

(=slope)

Question 4

Q

What is the hessian?

Answer

A

consists of the second (partial) derivative(s) of the log-likelihood with respect to the parameters.
It captures how precise our estimates are.

must be < 0

(=curvature; larger hessian when steep curvature because it has rapidly changing first derivatives)

Question 5

Q

What happens to the maximum with a small n?

Answer

A

The smaller the n, the wider the cuvature, the more difficult to pinpoint the maximum. Larger n -> greater precision

Hessian is much smaller with a smaller n

Question 6

Q

What happens to the gradient and the hessian in a model with multiple parameters?

Answer

A

Instead of using the derivatives, the gradient now consists of partial derivatives (as many as we have parameters = Container Gradient

The hessian now becomes a Matrix (with respect to miu, sigma2, variance)

Question 7

Q

What is the covariance matrix?

Answer

A

The invert of the hessian matrix

-> how interconnected are miu and aigma2? If there is a high correlation between the estimates, there is not enough to shed light on the estimation

Question 8

Q

What does the invariance principle say?

Answer

A

Every function of a MLE is itself a MLE

Question 9

Q

What are the regularity conditions?

Answer

A

The observations are i.i.d.
The likelihood function is continuous in the parameter(s)
Identification: The PdF/pMF yiels different values
The true parameter value lies within the parameter space
The support of the PDF/PMF does not depend on a parameter
Incidental parameters: The number od nuisance parameters does not imcrease with n.

Question 10

Q

What are the asymptotic properties of MLE?

Answer

A

As the sample size goes to infonity, MLE displays:

consistency (if the model is correct, the sample converge to the true parameter values)
Normality (on the basis it can simulate uncertainty)
Efficiency (no other estimator that uses data as efficiently as the MLE)

Question 11

Q

What is a limitation of MLE?

Answer

A

In small samples MLEs may prove to be biased and have conplex sampling distributions.

Under n=100 you should make a bias correction

Question 12

Q

What is a numeric optimization?

Answer

A

The use of numerical methods to find minimums and/or maximums of functions. It is used when analytic solutions to optimization problems are impractical.

Question 13

Q

How does a hill-climbing algorithm work?

Answer

A

Pick a starting value
Evaluate the gradient: If sufficiently close to 0, then stop; otherwise proceed to step 3
update the starting value, the direction of the update is driven by the gradient: if positive, then increase the parameter value; if negative, then decrease the parameter value
Continue until convergence

Avoiding overshooting: The closer we get to the maximum, the smaller we update

Question 14

Q

What does convergence mean?

Answer

A

The algorithm has converged whenn there is no further chabge in the estimate or if that change is extremely small (below the tolerance)

Question 15

Q

How can we avoid concergence problems?

Answer

A

avoid small samples
avoid large ratios of variances between variables (vastly different scales)
choose the right model
change algorithms
ensure that the data have been cleaned properly

Question 16

Q

what is the density function for a univariate normal distribution?

Answer

A

The relative probability of obtaining a score value from normally distributed population with a particular mean and variance.

Question 17

Q

What is the Mahalanobis distance?

Answer

A

The standardized distance betweena score and the mean.

Question 18

Q

What is the goal of maximum likelihoid estimation?

Answer

A

To identify ghe population parameter values that have the highest probability of producing a particular sample of the data.

Question 19

Q

What does the curvature of the loglikelihood function say?

Answer

A

It provides information about the uncertainty of an estimate. A flat function makes it difficult to find the exact maximum and is less crrtain.