Maximum Likelihood Estimation Flashcards
Which steps does the optimization process follow?
- Take the first derivative with respect to the parameter(s) = theta
- First order condition: Set the derivative equal to 0 and solve for theta (we know that at the maximum, the slope of the tangent line is 0)
- Second order condition: Ensure that the second derivative at the estimator is negative.
What is the kernel of the Log-Likelihood?
it strips the log-likelihood of those parts that exclude the parameters of interest.
What is the gradient?
It consists of the first (partial) derivative(s) of the log-likelihood with respect to the parameters.
It must be 0.
(=slope)
What is the hessian?
consists of the second (partial) derivative(s) of the log-likelihood with respect to the parameters.
It captures how precise our estimates are.
must be < 0
(=curvature; larger hessian when steep curvature because it has rapidly changing first derivatives)
What happens to the maximum with a small n?
The smaller the n, the wider the cuvature, the more difficult to pinpoint the maximum. Larger n -> greater precision
Hessian is much smaller with a smaller n
What happens to the gradient and the hessian in a model with multiple parameters?
Instead of using the derivatives, the gradient now consists of partial derivatives (as many as we have parameters = Container Gradient
The hessian now becomes a Matrix (with respect to miu, sigma2, variance)
What is the covariance matrix?
The invert of the hessian matrix
-> how interconnected are miu and aigma2? If there is a high correlation between the estimates, there is not enough to shed light on the estimation
What does the invariance principle say?
Every function of a MLE is itself a MLE
What are the regularity conditions?
- The observations are i.i.d.
- The likelihood function is continuous in the parameter(s)
- Identification: The PdF/pMF yiels different values
- The true parameter value lies within the parameter space
- The support of the PDF/PMF does not depend on a parameter
- Incidental parameters: The number od nuisance parameters does not imcrease with n.
What are the asymptotic properties of MLE?
As the sample size goes to infonity, MLE displays:
- consistency (if the model is correct, the sample converge to the true parameter values)
- Normality (on the basis it can simulate uncertainty)
- Efficiency (no other estimator that uses data as efficiently as the MLE)
What is a limitation of MLE?
In small samples MLEs may prove to be biased and have conplex sampling distributions.
Under n=100 you should make a bias correction
What is a numeric optimization?
The use of numerical methods to find minimums and/or maximums of functions. It is used when analytic solutions to optimization problems are impractical.
How does a hill-climbing algorithm work?
- Pick a starting value
- Evaluate the gradient: If sufficiently close to 0, then stop; otherwise proceed to step 3
- update the starting value, the direction of the update is driven by the gradient: if positive, then increase the parameter value; if negative, then decrease the parameter value
- Continue until convergence
Avoiding overshooting: The closer we get to the maximum, the smaller we update
What does convergence mean?
The algorithm has converged whenn there is no further chabge in the estimate or if that change is extremely small (below the tolerance)
How can we avoid concergence problems?
- avoid small samples
- avoid large ratios of variances between variables (vastly different scales)
- choose the right model
- change algorithms
- ensure that the data have been cleaned properly