Week 3 and Week 4 Flashcards

1
Q

What is a short model?

A

We replace some variables with a new variable, ex. Beta:2X2 + U becomes V and replace beta for alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

if the short model satisfies the exogeneity assumption E[V | X1] = 0 then

A

αˆ_1 ≈ α_1 = β_1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What 2 condition can confirm that cov(X1, V ) = 0

A

1 of these has to hold:
• the variable X2 does not affect the outcome, β2 = 0,
• the regressors X1 and X2 are uncorrelated, cov(X1, X2) = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is omitted variable bias /OV bias and how does the formula look?

A

The term (cov(X1,X2)/var(X1))β2 is called the omitted variable bias of αˆ1.

This bias indicates by how much the estimator αˆ1 deviates systematically from its estimand β1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is OV bias not zero?

A

The bias is different from zero if
• the variable X2 does affects the outcome, β2≠ 0 and
• the regressors X1 and X2 are correlated, cov(X1, X2)≠ 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can we reduce OV bias by adding more regressors?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When dealing with measurement error what is omitted variable bias called?

A

(cov(X1∗, W )/var(X1∗))β2 is called the attenuation bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it mean if αˆ1 is biased toward zero.

A

Therefore, in large samples, αˆ1 is a scaled-down version of the true effect β1, it has the same sign but is smaller in absolute value. In other words, αˆ1 estimates a value that is closer to zero than the true effect. We say that αˆ1 is biased toward zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if the variance of the measurement error is small relative to the variance of X1 the attenuation factor will be …

A

close to 1 and the attenuation bias will be small.

  • Conversely, if the variance of the measurement error is large relative to the variance of X1 the attenuation factor will be close to 0 and the attenuation bias will be large.

-In particular, we may estimate the effect of X1 to be close to zero even if its true effect is substantially different from zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does exogeneity mean?

A

That we can’t predict U from the regressors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transforming a long model to a short model might create an issue

A

Endogeneity, we can then predict U from ex. B2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is the correlation (covariance) never zero in the short model.

A

It is never zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we have a positive B1, Can a VERY negative OV-bias flip the sign of B^1?

A

Yes. A very negative can do that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

classical measurement error assumptions

A
  1. E[w]=0 measures correctly on average
  2. W is independent of X1 and U, no systematic MISmeasurement.
  3. var(w)>0 measurement error exist
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Attenuation bias formula, can we switch B2 to -B1

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does RCT stand for?

A

Randomized controlled trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

3 examples for when exogeneity isn’t fulfilled

A
  1. omitted variables
  2. measurement error exists
  3. equilibrium conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

OLS function for B^_1

A

E^[Y|X_1=1]-E^[Y|X_1=0] / E^[X|X_1=1] - E^[Y|X_1=0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

IV regression for B^_1

A

E^[Y|group1]-E^[Y|group 2] / E^[X|group1]-E^[X|group 2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does endogenous sorting do

A

reveals ceteris paribus effect horizontally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Instrumental variable in IV

A

Z, binary or dummy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Instrumental exogeneity

A

E[U|Z]=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Biv for instrumental variable

A

E^[Y|Z=1]-E^[Y|Z=0] / E^[X|Z=1]-E^[X|Z=0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does Instrumental exogeneity and instrument relevance mean and imply?

A

Instrument exogeneity: E[U|Z]=0
Instrument relevance: E^[X|Z=1]≠E^[X|Z=0]
both assures that we are only moving horizontally in graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

OLS characteristics

A
  • for all X1
  • B^1= Cov^ (Y,X) / var (X)

for x binary
B^_1 = E^[Y|X=1]-E^[Y|X=0] / E^[X|X=1]-E^[X|X=0]

  • slope coefficient: B^_1 is the estimated change in y and X when Z increases by one unit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

IV characteristics

A
  • for all X Z
  • B^1= Cov^ (Y,Z) / cov ^ (X, Z)

for binary instrument
B^_1 = E^[Y|Z=1]-E^[Y|Z=0] / E^[X|Z=1]-E^[X|Z=0]

  • slope coefficient: delta / fi is the estimated change in y and X when X´Z increases by one unit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What differs between the first stage and second stage regression in 2SLS

A

The regression at the second stage deviates from the OLS regressions that we have considered so far in that one of the regressors is an estimated quantity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

instrument relevance assumption

A

cov(X1,X2)≠0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Is this true: A linear model that may be suitable for causal inference may not be a good choice for prediction and vice versa.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

instrument exogeneity assumption

A

x1 can’t predict U

31
Q

In the context of prediction what do we refer a long linear model as?

A

Complex model

32
Q

passing from a linear model with only a few variables to a complex model with many control variables tends to …

A

inflate the variance of estimated (causal) marginal effects.

33
Q

By constructing an IV estimator of the short linear model we INCREASE OR DECREASE the variance of the estimated slope coefficients compared to direct OLS estimation of the short linear model.

A

Increase

34
Q

A high variance means that we will have large confidence intervals and therefore the power will

A

be lacking on the tests on hypothesis about the true causal effect.

35
Q

Does using a less complex model guarantee a well-behaved error term?

A

No, using a less complex model does not guarantee a well-behaved error term,
- it will decrease variance but allow the possibility of systematic bias of unknown (and in general unbounded) size.

36
Q

What is the bias-variance trade-off of prediction?

A

we want to avoid specifying a very complex model that is difficult to estimate, i.e., for which we estimate parameter values (think slope coefficients) with very large variances.

37
Q

Difficult or easy to find a model that is valid for causal inference?

A

Difficult

38
Q

In the context of prediction what is this called:
- a random sample (Yi, X1,i, . . . , Xk,i) Ni=1 from our population then we can
estimate these coefficients by the OLS estimators βˆ0, . . . , βˆk.

A

training sample –> estimating model parameters from this is called model training.

39
Q

Are we interested in the predictions of the training sample

A

No

40
Q

training sample version of the mean-squared error is called a

A

training error

41
Q

What does a low/high training error mean in R2

A
  • low training error corresponds to a high R2 value (close to one),
  • large training error corresponds to a low R2 value (close to zero).

Since the training error is not a good measure of predictive power, neither is the R2.

42
Q

How close our prediction is to the realized outcome Y oos(ω1) depends on three factors:

A
  1. How close the estimates βˆ0(ω0),…,βˆk(ω0) are to the population coefficients b∗∗,…,b∗. This factor exists because of randomness of the training sample (uncertainty about ω0).
  2. The realized values of the predictors x1,…,xk. The linear prediction rule will typically work better for some realizations than for others. What values we see depends on the randomness of the out-of-sample draw (uncertainty about ω1).
  3. How the part of Y oss that is not predictable from the predictors realizes. This is determined by how the out-of-sample draw realizes (uncertainty about ω1).
43
Q

what does EPE take account to?

A

both uncertainty about the realization of the training sample and uncertainty about the realization of the out-of- sample draw.

ex ante measure of the cumulative errors

44
Q

EPE has three errors, which are these?

A
  1. irreducible error (U)
  2. bias (approximation error)
  3. variance (estimation error)
45
Q

To choose a prediction model with a low EPE we have to optimally trade off two effects. Which ones?

A

Bias and variance, bias-variance trade-off

46
Q

why is training error not a good estimate of EPE?

A

training error estimates only bias but ignores the variance.

-making a model more complex by adding additional predictors will never increase (and in practice almost always strictly decrease) the training error. However, as we add more and more predictors the variance component of the EPE is expected to dominate eventually and, unlike the training error, the EPE will increase.

47
Q

what is sample splitting?

A

To make sure that there is both a training and a test sample the common approach is to randomly split the available data (of size m + n) into a training and test sample (of size n and m, respectively)

48
Q

What is overfitting

A

OLS usually overfits, fit too close to the true functional form.

Usually a problem if the researchers keeps adding new predictors in order to decrease the training error (equivalently increase the R2) even further.

49
Q

Ridge regression

A

Ridge regression improves on our previous approach by shrinking different slope coefficients by different factors.

50
Q

What does OLS only care about?

A

It would only care about reducing the bias component of the EPE and would tend to overfit.

51
Q

Lasso regression

A

Lasso tends to produce models that are of low complexity

52
Q

If predictors are correlated then Ridge regression will not …

A

apply the same amount of shrinkage to all coefficients. This distinguishes Ridge regression from the na ̈ıve shrinkage method discussed above.

53
Q

discrete time series

A

Often, a time series can only be observed at pre-defined discrete points in time.

54
Q

Forecast

A

Predictions about the future

55
Q

nowcasting

A

Predicting the current period yt (or recent past periods such as yt−1) from the data that is available to the econometrician in period t

56
Q

Can we do statistical inferenceon a sample of size one?

A

Observing the time series in only a single state means that we have a sample of size one. SO NO

57
Q

Stationarity

A

requires that these two segments have an identical UNCONDITIONAL distribution.

-Under stationarity, Y1 and Ys2 have the same distributions and in particular
E[Y1] =E[Ys2 ]
var(Y1) = var(Ys2 ).

58
Q

Weak dependence

A

Weak dependence restricts the information about the time series that becomes available dynamically as time passes and more and more periods of the time series are observed.

59
Q

Serial- / autocorrelation

A

Serial correlation means that observations of the time series at different points in time are correlated. One important example of serial correlation is auto-correlation. This refers to correlation between two subsequent time periods.

60
Q

Weak time dependence

A

Time Yt isn’t affected by Yt-1

61
Q

sample independently and identically

A
  • We already understand that weak dependence ensures that every period reveals new information. (independent)
  • in a time series we observe many k segments and under stationarity (drum roll) they are all the same (=they have same distribution). (identical)
62
Q

forecasting model left and right side

A

(outcome on left-hand side)
(predictors on right-hand side).

63
Q

cross-sectional data

A

Cross-sectional data can be represented in a spread sheet format where each row represents an observed unit and each column describes a unit characteristic.

64
Q

wide vs long model

A

contains more columns and is therefore wider than the table in Figure 1 Therefore, the representation of the data in Figure 2 is called the “wide” and Figure 1 is called the long model.

65
Q

fixed effect

A

Suppose that At does not change over time. In that case, we can write At = A. Such an A describes the total effect of unobserved unit characteristics that do not change over time.

66
Q

fixed effect transformation

A

fr =(fr1982 + fr1988)/2, (average year)
tax =(tax1982 + tax1988)/2, (average tax)
U ̄ =(U1982 + U1988)/2. (Average U)

fr = β1tax + A + U
subtract averages. from the original model ?

gives us frt = β1taxt + Ut t = 1982, 1988. where the fixed effect is removed and B1 is preserved

67
Q

clusters

A

Computing standard errors under the assumption that certain blocks of observations exhibit correlation is called “computing standard errors with clustering”. The blocks of correlated observations are called clusters. For panel data, it is often sensible to assume that all the observations of one unit form a cluster.

68
Q

Errors depend on:

A

Bias: good estimation with more Xs
Variance: good estimation with less Xs

69
Q

Training error

A

-Does not measure EPE
-Estimated idiosyncratic + bias

70
Q

Test error

A

-measures EPE

71
Q

cross section sampling

A

Random sampling
identical & independent
imposed by sampling design

72
Q

time series sampling

A

stationary and weakly time dependence
property of economic environment
difficult to verify empirically
often fail

73
Q

first difference transformation

A

delta all variables