Week 6 Flashcards

1
Q

What are the steps to find the ML-estimator?

A

Differentiate (if you want the natural log) the PDF, and then equate to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we optimize for in the Least Squares, Best Unbiased, and Maximum Likelyhood Estimation?

A

Least Squares: Squared deviations

BLUE: the variance (s.t. some constraints)

MLE: (Maximize) the likelyhood function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between the Likelyhood function and the CDF?

A

The interpretation differs, in the likelyhood function you change the parameter to find the most likely true value, where as with the CDF you look what the probability is given some parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the ML-estimators for a linear regression on a Normal distribution?

A

beta hat = (X’X)-1X’y

sigma hat = (y - X beta hat)’(y - X beta hat)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is invariance, and what does it have to do with the ML-estimator?

A

If theta hat is the ML-estimator of theta, then g(theta hat) is the ML-estimator of g(theta).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to calculate the information matrix?

A

F = -E(H), where H is the second derivative of the Likelyhood function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the first order regularity?

A

E(q) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the second order regularity?

A

var(q) = -E(H) = F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are the First and Second proven?

A

Let f be the PDF, s.t.

∫ f dy = 1

Thus we can do the following:

∫∂f/∂𝜃 dy = ∂/∂𝜃 ∫ f dy = ∂1 / ∂𝜃 = 0 and

∫ ∂2f/∂𝜃2 dy = 0 (using same trick)

Thus if you consider:

∂ ln f / ∂𝜃 = 1/f ∂f/∂𝜃 and ∂2 ln f/∂𝜃2 = ∂/∂𝜃 (1/f ∂f/∂𝜃) = (∂f-1/∂𝜃) ∂f/∂𝜃 + 1/f ∂2f/∂𝜃2 = -1/f2 (∂f/∂𝜃)2 + 1/f ∂2f/∂𝜃2 = -(∂ ln f/∂𝜃)2 + 1/f ∂2f/∂𝜃2

If we take the expectations we get:

E(∂ ln f/∂𝜃) = ∫∂ln f/∂𝜃 dy) = ∫∂f/∂𝜃 dy = 0

E(∂2 ln f/∂𝜃2) = -E(∂ln f/ ∂𝜃)2 + ∂2f/∂𝜃2 dy = -E(∂ln f/ ∂𝜃)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the Cramer-Rao inequality say?

A

It says that the variance of an unbiased estimator has a lower bound which is given by the inverse of var(q) or equivalently by F-1. If an unbiased estimator achieves the Cramer-Rao bound then it has the lowest possible variance, a.k.a. efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Is the Least Squares beta estimator efficient?

A

Yes, by Cramer-Rao it is not only the best unbiased linear estimator, but also the best unbiased estimator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is the Least Squares sigma estimator efficient?

A

Altough it doesn’t attain the Cramer-Rao lower bound, it is efficient in the sense that there’s no other unbiased estimator for sigma that is unbiased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the proof of the Cramer-Rao lower bound (for a single estimator)?

A

We know that E(q) = 0, var(q) = F, var (beta hat, sigma2 hat) ≈ [var(q)]-1 = F-1

Suppose we have unbiased estimator t of a parameter 𝜃, s.t.

E(t) = ∫(tf)dy = 𝜃, this implies ∂E(t)/∂𝜃 =1, letting q = ∂ ln f/∂𝜃, implies:

1 = ∂/∂𝜃 ∫(tf)dy = ∫∂(tf)/∂𝜃 dy = ∫∂t/∂𝜃 f dy + ∫t ∂f/∂𝜃 dy = ∫t ∂f/∂𝜃 dy = ∫(tq) f dy = E(tq) = cov(t, q)

We used the fact that ∂t/∂𝜃 = 0. If we consider z = t - 𝜃 - 𝛼q, where 𝛼 us a constant to be chosen later. Since t is unbiased and E(q) = 0. We have E(z) = 0. Also:

0 ≤ var(z) - 2/var(q) + 1/var(q) = var(t) - 1/var(q), thus var(t) ≥ 1/var(q).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are the ML-estimators found when restrictions are set?

A

First we define the Lagrangian: 𝜓(β, σ2) = ln L(β, σ2) - l’(Rβ - r), where l’ = (λ1, λ2, λ3, …)’ is a vector of m Lagrangian multipliers. Differentiating and equation to 0 gives the following system:

(X’y - X’Xβ)’/sigma2 tilde = l tilde’R,

(y - Xβ tilde)’(y - Xβ tilde)/(2σ2 tilde),

R β tilde = r

We can obtain from this:

β tilde = β hat σ2tilde(X’X)-1R’l tilde, thus:

r = R β tilde = R β hat - σ2 tilde(X’X)-1R’ l tilde.

After which follows:

β tilde = β hat - (X’X)-1R’(R(X’X)-1R’)-1(Rβ hat - r), and

σ2 tilde = (y- Xβtilde)’(y - Xβ tilde)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a much easier way of finding the ML-estimators using Vectors and Matrices?

A

Find the inverse of G =

(X’X R’)

(R 0)

which is:

((X’X)-1- (X’X)-1R’A-1R(X’X)-1 (X’X)-1R’A-1)

(A-1R(X’X)-1 -A-1)

where A = R(X’X)-1R’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Wald test (for known sigma)?

A

We have R beta hat ~ N(R beta, sigma2 R(X’X)-1R’), thus under the null hypothesis R beta hat - r ~ N(0, sigma2R(X’X)-1R’). Thus we have the following test-statistic:

W = (R beta hat)’(R(X’X)-1R’)-1(R beta hat - r)/sigma2 ~ chi-squared(m)

17
Q

What is the Lagrange multiplier test, and how is it derived?

A

The LM test is based on the fact that the gradient vanishes and therefore if the null hypothesis is true, that q(beta tilde) should be close to zero. We have q(beta tilde) = X u tilde/sigma2

We can rewrite to X beta tilde = X beta hat - X(X’X)-1R’(R(X’X)-1R’)(R beta hat - r),

this implies: u tilde = u hat + X(X’X)-1R’(R(X’X)-1R’)-1(R beta hat - r), since X’u hat = 0, it follows:

q(beta tilde) = X’u/sigma2 = R’(R(X’X)-1R’)(R beta hat - r)/sigma2 = R’ l tilde.

We have that l tilde is normally distributed, so we have:

LM = sigma2l’ tildeR(X’X)-1R’l tilde ~ chi-squared(m) distributed.

18
Q

What is the Likelyhood-ratio test, and how is it derived?

A

The Likelyhood-ratio test is based on the ratio of the maximized likelihood with the restriction to the maximized likelihood without the restriction, which should be close to one if the null-hypothesis is true. It actually checks the difference between log-likelyhoods.

LR = -2(ln (L beta tilde) - ln (L beta hat)) = (u tilde ‘ u tilde - u hat’ u hat) / sigma2 ~ chi-squared(m)

19
Q

What can be said about the Lagrange-multiplier, Likelihood-ratio, and Wald test?

A

In a simplified case they are all the same. (Only when sigma is known)

20
Q

What is the order of Wald, LM and LR tests when sigma is unknown, and why?

A

Since sigma isn’t known we have

W = n(sigma2 tilde - sigma2 hat)/sigma2 hat = n(lambda -1)

LM = n(sigma 2 tilde - sigma2 hat)/sigma2 tilde = n(1 - lambda-1)

Where lambda = sigma2 tilde/sigma2 hat, and we also have

LR = n log lambda,

from this follows LM ≤ LR ≤ W