Maximum Likelihood Estimation Flashcards

1
Q

Which steps does the optimization process follow?

A
  1. Take the first derivative with respect to the parameter(s) = theta
  2. First order condition: Set the derivative equal to 0 and solve for theta (we know that at the maximum, the slope of the tangent line is 0)
  3. Second order condition: Ensure that the second derivative at the estimator is negative.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the kernel of the Log-Likelihood?

A

it strips the log-likelihood of those parts that exclude the parameters of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the gradient?

A

It consists of the first (partial) derivative(s) of the log-likelihood with respect to the parameters.

It must be 0.

(=slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the hessian?

A

consists of the second (partial) derivative(s) of the log-likelihood with respect to the parameters.
It captures how precise our estimates are.

must be < 0

(=curvature; larger hessian when steep curvature because it has rapidly changing first derivatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens to the maximum with a small n?

A

The smaller the n, the wider the cuvature, the more difficult to pinpoint the maximum. Larger n -> greater precision

Hessian is much smaller with a smaller n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens to the gradient and the hessian in a model with multiple parameters?

A

Instead of using the derivatives, the gradient now consists of partial derivatives (as many as we have parameters = Container Gradient

The hessian now becomes a Matrix (with respect to miu, sigma2, variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the covariance matrix?

A

The invert of the hessian matrix

-> how interconnected are miu and aigma2? If there is a high correlation between the estimates, there is not enough to shed light on the estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the invariance principle say?

A

Every function of a MLE is itself a MLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the regularity conditions?

A
  1. The observations are i.i.d.
  2. The likelihood function is continuous in the parameter(s)
  3. Identification: The PdF/pMF yiels different values
  4. The true parameter value lies within the parameter space
  5. The support of the PDF/PMF does not depend on a parameter
  6. Incidental parameters: The number od nuisance parameters does not imcrease with n.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the asymptotic properties of MLE?

A

As the sample size goes to infonity, MLE displays:

  1. consistency (if the model is correct, the sample converge to the true parameter values)
  2. Normality (on the basis it can simulate uncertainty)
  3. Efficiency (no other estimator that uses data as efficiently as the MLE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a limitation of MLE?

A

In small samples MLEs may prove to be biased and have conplex sampling distributions.

Under n=100 you should make a bias correction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a numeric optimization?

A

The use of numerical methods to find minimums and/or maximums of functions. It is used when analytic solutions to optimization problems are impractical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a hill-climbing algorithm work?

A
  1. Pick a starting value
  2. Evaluate the gradient: If sufficiently close to 0, then stop; otherwise proceed to step 3
  3. update the starting value, the direction of the update is driven by the gradient: if positive, then increase the parameter value; if negative, then decrease the parameter value
  4. Continue until convergence

Avoiding overshooting: The closer we get to the maximum, the smaller we update

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does convergence mean?

A

The algorithm has converged whenn there is no further chabge in the estimate or if that change is extremely small (below the tolerance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we avoid concergence problems?

A
  • avoid small samples
  • avoid large ratios of variances between variables (vastly different scales)
  • choose the right model
  • change algorithms
  • ensure that the data have been cleaned properly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the density function for a univariate normal distribution?

A

The relative probability of obtaining a score value from normally distributed population with a particular mean and variance.

17
Q

What is the Mahalanobis distance?

A

The standardized distance betweena score and the mean.

18
Q

What is the goal of maximum likelihoid estimation?

A

To identify ghe population parameter values that have the highest probability of producing a particular sample of the data.

19
Q

What does the curvature of the loglikelihood function say?

A

It provides information about the uncertainty of an estimate. A flat function makes it difficult to find the exact maximum and is less crrtain.