Week 6 Flashcards

Question 1

Q

What are the steps to find the ML-estimator?

Answer

A

Differentiate (if you want the natural log) the PDF, and then equate to 0.

Question 2

Q

What do we optimize for in the Least Squares, Best Unbiased, and Maximum Likelyhood Estimation?

Answer

A

Least Squares: Squared deviations

BLUE: the variance (s.t. some constraints)

MLE: (Maximize) the likelyhood function

Question 3

Q

What is the difference between the Likelyhood function and the CDF?

Answer

A

The interpretation differs, in the likelyhood function you change the parameter to find the most likely true value, where as with the CDF you look what the probability is given some parameters.

Question 4

Q

What are the ML-estimators for a linear regression on a Normal distribution?

Answer

A

beta hat = (X’X)^-1X’y

sigma hat = (y - X beta hat)’(y - X beta hat)/n

Question 5

Q

What is invariance, and what does it have to do with the ML-estimator?

Answer

A

If theta hat is the ML-estimator of theta, then g(theta hat) is the ML-estimator of g(theta).

Question 6

Q

How to calculate the information matrix?

Answer

A

F = -E(H), where H is the second derivative of the Likelyhood function.

Question 7

Q

What is the first order regularity?

Question 8

Q

What is the second order regularity?

Answer

A

var(q) = -E(H) = F

Question 9

Q

How are the First and Second proven?

Answer

A

Let f be the PDF, s.t.

∫ f dy = 1

Thus we can do the following:

∫∂f/∂𝜃 dy = ∂/∂𝜃 ∫ f dy = ∂1 / ∂𝜃 = 0 and

∫ ∂²f/∂𝜃² dy = 0 (using same trick)

Thus if you consider:

∂ ln f / ∂𝜃 = 1/f ∂f/∂𝜃 and ∂² ln f/∂𝜃² = ∂/∂𝜃 (1/f ∂f/∂𝜃) = (∂f^-1/∂𝜃) ∂f/∂𝜃 + 1/f ∂²f/∂𝜃² = -1/f² (∂f/∂𝜃)² + 1/f ∂²f/∂𝜃² = -(∂ ln f/∂𝜃)² + 1/f ∂²f/∂𝜃²

If we take the expectations we get:

E(∂ ln f/∂𝜃) = ∫∂ln f/∂𝜃 dy) = ∫∂f/∂𝜃 dy = 0

E(∂² ln f/∂𝜃²) = -E(∂ln f/ ∂𝜃)² + ∂²f/∂𝜃² dy = -E(∂ln f/ ∂𝜃)²

Question 10

Q

What does the Cramer-Rao inequality say?

Answer

A

It says that the variance of an unbiased estimator has a lower bound which is given by the inverse of var(q) or equivalently by F^-1. If an unbiased estimator achieves the Cramer-Rao bound then it has the lowest possible variance, a.k.a. efficient.

Question 11

Q

Is the Least Squares beta estimator efficient?

Answer

A

Yes, by Cramer-Rao it is not only the best unbiased linear estimator, but also the best unbiased estimator.

Question 12

Q

Is the Least Squares sigma estimator efficient?

Answer

A

Altough it doesn’t attain the Cramer-Rao lower bound, it is efficient in the sense that there’s no other unbiased estimator for sigma that is unbiased.

Question 13

Q

What is the proof of the Cramer-Rao lower bound (for a single estimator)?

Answer

A

We know that E(q) = 0, var(q) = F, var (beta hat, sigma² hat) ≈ [var(q)]^-1 = F^-1

Suppose we have unbiased estimator t of a parameter 𝜃, s.t.

E(t) = ∫(tf)dy = 𝜃, this implies ∂E(t)/∂𝜃 =1, letting q = ∂ ln f/∂𝜃, implies:

1 = ∂/∂𝜃 ∫(tf)dy = ∫∂(tf)/∂𝜃 dy = ∫∂t/∂𝜃 f dy + ∫t ∂f/∂𝜃 dy = ∫t ∂f/∂𝜃 dy = ∫(tq) f dy = E(tq) = cov(t, q)

We used the fact that ∂t/∂𝜃 = 0. If we consider z = t - 𝜃 - 𝛼q, where 𝛼 us a constant to be chosen later. Since t is unbiased and E(q) = 0. We have E(z) = 0. Also:

0 ≤ var(z) - 2/var(q) + 1/var(q) = var(t) - 1/var(q), thus var(t) ≥ 1/var(q).

Question 14

Q

How are the ML-estimators found when restrictions are set?

Answer

A

First we define the Lagrangian: 𝜓(β, σ²) = ln L(β, σ²) - l’(Rβ - r), where l’ = (λ₁, λ₂, λ_3,…)’ is a vector of m Lagrangian multipliers. Differentiating and equation to 0 gives the following system:

(X’y - X’Xβ)’/sigma2 tilde = l tilde’R,

(y - Xβ tilde)’(y - Xβ tilde)/(2σ2 tilde),

R β tilde = r

We can obtain from this:

β tilde = β hat σ²tilde(X’X)^-1R’l tilde, thus:

r = R β tilde = R β hat - σ² tilde(X’X)^-1R’ l tilde.

After which follows:

β tilde = β hat - (X’X)^-1R’(R(X’X)^-1R’)^-1(Rβ hat - r), and

σ² tilde = (y- Xβtilde)’(y - Xβ tilde)/n

Question 15

Q

What is a much easier way of finding the ML-estimators using Vectors and Matrices?

Answer

A

Find the inverse of G =

(X’X R’)

(R 0)

which is:

((X’X)^-1- (X’X)^-1R’A^-1R(X’X)^-1 (X’X)^-1R’A^-1)

(A^-1R(X’X)^-1 -A^-1)

where A = R(X’X)^-1R’

Question 16

Q

What is the Wald test (for known sigma)?

Answer

Study These Flashcards

A

We have R beta hat ~ N(R beta, sigma² R(X’X)^-1R’), thus under the null hypothesis R beta hat - r ~ N(0, sigma²R(X’X)^-1R’). Thus we have the following test-statistic:

W = (R beta hat)’(R(X’X)^-1R’)^-1(R beta hat - r)/sigma² ~ chi-squared(m)

Question 17

Q

What is the Lagrange multiplier test, and how is it derived?

Answer

Study These Flashcards

A

The LM test is based on the fact that the gradient vanishes and therefore if the null hypothesis is true, that q(beta tilde) should be close to zero. We have q(beta tilde) = X u tilde/sigma²

We can rewrite to X beta tilde = X beta hat - X(X’X)^-1R’(R(X’X)^-1R’)(R beta hat - r),

this implies: u tilde = u hat + X(X’X)^-1R’(R(X’X)^-1R’)^-1(R beta hat - r), since X’u hat = 0, it follows:

q(beta tilde) = X’u/sigma² = R’(R(X’X)^-1R’)(R beta hat - r)/sigma² = R’ l tilde.

We have that l tilde is normally distributed, so we have:

LM = sigma²l’ tildeR(X’X)^-1R’l tilde ~ chi-squared(m) distributed.

Question 18

Q

What is the Likelyhood-ratio test, and how is it derived?

Answer

Study These Flashcards

A

The Likelyhood-ratio test is based on the ratio of the maximized likelihood with the restriction to the maximized likelihood without the restriction, which should be close to one if the null-hypothesis is true. It actually checks the difference between log-likelyhoods.

LR = -2(ln (L beta tilde) - ln (L beta hat)) = (u tilde ‘ u tilde - u hat’ u hat) / sigma² ~ chi-squared(m)

Question 19

Q

What can be said about the Lagrange-multiplier, Likelihood-ratio, and Wald test?

Answer

Study These Flashcards

A

In a simplified case they are all the same. (Only when sigma is known)

Question 20

Q

What is the order of Wald, LM and LR tests when sigma is unknown, and why?

Answer

Study These Flashcards

A

Since sigma isn’t known we have

W = n(sigma² tilde - sigma² hat)/sigma² hat = n(lambda -1)

LM = n(sigma ² tilde - sigma² hat)/sigma² tilde = n(1 - lambda^-1)

Where lambda = sigma² tilde/sigma² hat, and we also have

LR = n log lambda,

from this follows LM ≤ LR ≤ W

Week 6 Flashcards

(20 cards)