Chapter 1 Linear regression Flashcards

1
Q

A1. Linearity, implication

A

the marginal effect of the regressors do not depend on the level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A2. Strict exogeneity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A3. No mulitcolinearity

A

The rank of nxk data matrix X is K with probability 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A4. Spherical error variance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Implications of A2

A

Justified by economic theory, not metrics.
Usually not satisfied by time series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Implications of A4

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does iid affect our assumptions’ restrictiveness?

A

In random sampling we obtain uncorrelated xi, therefore E(epsilon_i|x_i)=0.
E(epsilon_i^2) remains constant across i -> unconditional homoskedasticity, but the value E(epsilon_i^2|x_i) may differ across i, therefore A4 remains restrictive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we look for the parameters in OLS? does it make sense?

A

We minimize the SSR (loss function) to minimize the errors.
It makes sense if we want to predict, but not necessarily if we want to interpret causality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SSR formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

to isolate b from the minimized SSR function we need the inverse of x’x to exist. Is this fulfilled?

A

Yes:
1. By A3 the determinant is different from 0
2. It is a square matrix by definition
3. n>k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Projection matrix

A

P=X(X’X)^{-1}X’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PY

A

=Xb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Anihilator matrix

A

M=I-P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MY

A

=Y-Xb=e
(residuals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Property of M and P

A

They are both symmetric and idempotent (AA=A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PX

A

=X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

MX

A

=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

PM

A

=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Finding the variance in OLS (sigma^2)

A

Since we don’t know epsilon^2, we need an estimator that gives us an approximation to the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Finding the R^2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Centered R^2

A

By removing the mean, the centered R^2 describes the explanatory power of the Xs, not the mu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Influential power of an observation

A

where the subindex i indicated the estimator without the ith observation.
Pi=x_i(x’x)^{-1}x_i’
trace P=k.
If all i have similar contribution, Pi is approx k/n. If i is an outlier, Pi is much larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Statistical properties of b: 1. unbiasedness (E(b)=\beta).
Which assumptions do we need?

A
  1. Linearity to change y to its meaning
  2. Strict exogeneity to cancel out the second term in part 1
  3. No multicolinearity so that the inverse exists.
    Note: if 2 doesn’t hold (like in time series), b is biased.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Definition of conditional variance for a vector

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Statistical properties of b: 2. BLUE.
Develop the variance of b OLS under conditional homoskedasticity

A

This is the smallest variance we can obtain with a BLUE estimator (proof by Gauss-Markov theorem)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Develop the Gauss-Markov theorem

A

Notice that DD’ is a quadratic form, so it is a positive definite matrix, thus cond.var of beta hat is bigger or equal to cond.var of b

27
Q

Covariance of b, Depsilon (part of G-M theorem)

A
28
Q

Statistical properties of b: 3. Cov(b,|x)=0

A
29
Q

Prove the unbiasedness pf the variance estimator for OLS

A
30
Q

Assumption 5

A

We assume Normality (of epsilon given x) to perform tests

31
Q

how is b-beta distributed?

A

N(0, sigma^2(x’x)^{-1})

32
Q

If we want to standardize the b-beta distribution? (with sigma squared)

A
33
Q

If we want to standardize the b-beta distribution? (with s squared)

A
34
Q

How do we prove that the t test follows t(n-k) degrees of freedom (just steps)

A
  1. The numerator follows a N(0,1)
  2. The denominator follows a chi squared
  3. The numerator and denominator are independent
35
Q

How do we prove that the t test follows t(n-k) degrees of freedom (step 1)

A

We already showed that b-beta is the sampling error, imposing the normality assumption we know that it will follow a N(0, sigma^2(x’x)^{-1}), If we standardise it, the numerator will follow a N(0,1)

36
Q

How do we prove that the t test follows t(n-k) degrees of freedom (step 2)

A
37
Q

How do we prove that the t test follows t(n-k) degrees of freedom (step 3)

A

Cov(b,e |x)=0, since the numerator is a function of b and the denominator is a function of e and OLS sets b orthogonal to e.

38
Q

Does normality of x affect the distribution of t?

A

NO! We need to impose it to derive the distribution, but ultimately, t(n-k) does not depend on x, meaning that it stands even if not conditioned in the last step

39
Q

Scalar parameter hypothesis testing

A

apply t-test. Under the null beta_k=\bar{beta_k}

40
Q

Confidence intervals for the t test

A

P(b_k +- critical point*SE(b_k))=1-alpha

41
Q

What would happen with the t-test if strict exogeneity fails?

A

We’d start rejecting sooner than we ought to

42
Q

linear combination hypothesis testing (dimensions of the matrices in H0). Which is the assumption on the hypothesis?

A

Ho=Rbeta=r
R is #rxk
beta is kx1
r is a #rx1

Assumption: full row rank

43
Q

linear combination hypothesis testing, Test statistic formula

A

Notice F is always positive

43
Q

What is the distribution of F under Ho? (steps)

A
  1. Show the numerator follows a chi^2(denom. of numer., m1)
  2. Show the denominator follows a chi^2(denom. of numer., m2)
  3. Show 1 and 2 are independent
    Then F follows F(m1,m2)
    In this case, under Ho, F follows F(#r,n-k)
44
Q

What is the distribution of F under Ho? (step 1)

A
45
Q

What is the distribution of F under Ho? (step 2)

A
46
Q

What is the distribution of F under Ho? (step 3)

A

numerator and denominator are independent bc the first is a function of e and the second a function of b, and by ols, they are orthogonal.

47
Q

Formula for special case of F test where it’s the test for joint significance

A
48
Q

With MLE we obtain

A

The same parameter estimator as in ols

49
Q

log density of a multivariate normal

A
50
Q

MLE for sigma^2

A

the estimator for variance is biased because it doesn’t have the correction for the degrees of freedom

51
Q

What is the Cramer Rao lower bound

A

We’d find it by taking the log density function and deriving it twice for the parameter, then -E[]

52
Q

What is the Fischer information matrix

A

Where a11 is the lower bound for variance and a22 is the lower bound for the sigma^2, although no unbiased estimator achieves it.

53
Q

How can we prove BLUE?

A
  1. Via Cramer Rao’s lower bound, which relies heavily on the normality assumption but it holds for non linear models.
  2. Via Gauss-Markov, which assumes linearity but doesn’t rely so heavily on normality
54
Q

Consequences of relaxing the spherical error assumption for the parameter estimation

A

bols is still unbiased
var(b|x) no longer the minimum -> NOT BLUE
the tests don’t follow a t(n-k) or F(#r,n-k) anymore

55
Q

Adapting the data so that the consequences of relaxing the spherical error assumption doesn’t negatively affect the parameter estimation properties. Steps

A

Thus, we can use OLS again, which is now the same but with tildes everywhere -> GENERALIZED LEAST SQUARES ESTIMATOR

56
Q

Differences between GLS and OLS

A
  1. OLS puts equal weight to all observations, while GLS accounts for the variance (more weight if less variance)
  2. The variance of the beta is now smaller! beta GLS is the BLUE in this model
57
Q

Testing with GLS

A

t is the same
F is the same but with tildes

58
Q

Special GLS case when V(x) is diagonal

A
  • No serial correlation but we have heteroskedasticity
  • In this case GLS becomes weighted least squares
59
Q

Causality in linear regressions: ATT, ATE, CATE

A
60
Q

Can we interpret beta ols as ATE?

A

NO! beta ols=cov(Y,D)/Var(D) , when we develop this function we find that it is different than ATE UNLESS the treatment is independent of the potentials.
Interpretation: we require treatment to be randomly assigned.

61
Q

How can we separate OLS in terms of ATT + smth

A

Where the latter two bits are the selection effects. The first bit is ATT

62
Q

What happens when we have omitted variables?

A

Strict exogeneity is breached.
1. Randomize treatment
2. Use a quasi-experiment and use available control variables