Week 3 and Week 4 Flashcards

Question

OLS characteristics

Answer 1

- for all X1 - B^1= Cov^ (Y,X) / var (X) for x binary B^_1 = E^[Y|X=1]-E^[Y|X=0] / E^[X|X=1]-E^[X|X=0] - slope coefficient: B^_1 is the estimated change in y and X when Z increases by one unit

Answer 2

- for all X Z - B^1= Cov^ (Y,Z) / cov ^ (X, Z) for binary instrument B^_1 = E^[Y|Z=1]-E^[Y|Z=0] / E^[X|Z=1]-E^[X|Z=0] - slope coefficient: delta / fi is the estimated change in y and X when X´Z increases by one unit

Answer 3

The regression at the second stage deviates from the OLS regressions that we have considered so far in that one of the regressors is an estimated quantity.

Answer 4

cov(X1,X2)≠0

Answer 5

x1 can't predict U

Answer 6

Complex model

Answer 7

inflate the variance of estimated (causal) marginal effects.

Answer 8

be lacking on the tests on hypothesis about the true causal effect.

Answer 9

No, using a less complex model does not guarantee a well-behaved error term, - it will decrease variance but allow the possibility of systematic bias of unknown (and in general unbounded) size.

Answer 10

we want to avoid specifying a very complex model that is difficult to estimate, i.e., for which we estimate parameter values (think slope coefficients) with very large variances.

Answer 11

training sample --> estimating model parameters from this is called model training.

Answer 12

training error

Answer 13

- low training error corresponds to a high R2 value (close to one), - large training error corresponds to a low R2 value (close to zero). Since the training error is not a good measure of predictive power, neither is the R2.

Answer 14

1. How close the estimates βˆ0(ω0),...,βˆk(ω0) are to the population coefficients b∗∗,...,b∗. This factor exists because of randomness of the training sample (uncertainty about ω0). 2. The realized values of the predictors x1,...,xk. The linear prediction rule will typically work better for some realizations than for others. What values we see depends on the randomness of the out-of-sample draw (uncertainty about ω1). 3. How the part of Y oss that is not predictable from the predictors realizes. This is determined by how the out-of-sample draw realizes (uncertainty about ω1).

Answer 15

both uncertainty about the realization of the training sample and uncertainty about the realization of the out-of- sample draw. ex ante measure of the cumulative errors

Answer 16

1. irreducible error (U) 2. bias (approximation error) 3. variance (estimation error)

Answer 17

Bias and variance, bias-variance trade-off

Answer 18

training error estimates only bias but ignores the variance. -making a model more complex by adding additional predictors will never increase (and in practice almost always strictly decrease) the training error. However, as we add more and more predictors the variance component of the EPE is expected to dominate eventually and, unlike the training error, the EPE will increase.

Answer 19

To make sure that there is both a training and a test sample the common approach is to randomly split the available data (of size m + n) into a training and test sample (of size n and m, respectively)

Answer 20

OLS usually overfits, fit too close to the true functional form. Usually a problem if the researchers keeps adding new predictors in order to decrease the training error (equivalently increase the R2) even further.

Answer 21

Ridge regression improves on our previous approach by shrinking different slope coefficients by different factors.

Answer 22

It would only care about reducing the bias component of the EPE and would tend to overfit.

Answer 23

Lasso tends to produce models that are of low complexity

Answer 24

apply the same amount of shrinkage to all coefficients. This distinguishes Ridge regression from the na ̈ıve shrinkage method discussed above.

Answer 25

Often, a time series can only be observed at pre-defined discrete points in time.

Answer 26

Predictions about the future

Answer 27

Predicting the current period yt (or recent past periods such as yt−1) from the data that is available to the econometrician in period t

Answer 28

Observing the time series in only a single state means that we have a sample of size one. SO NO

Answer 29

requires that these two segments have an identical UNCONDITIONAL distribution. -Under stationarity, Y1 and Ys2 have the same distributions and in particular E[Y1] =E[Ys2 ] var(Y1) = var(Ys2 ).

Answer 30

Weak dependence restricts the information about the time series that becomes available dynamically as time passes and more and more periods of the time series are observed.

Answer 31

Serial correlation means that observations of the time series at different points in time are correlated. One important example of serial correlation is auto-correlation. This refers to correlation between two subsequent time periods.

Answer 32

Time Yt isn't affected by Yt-1

Answer 33

- We already understand that weak dependence ensures that every period reveals new information. (independent) - in a time series we observe many k segments and under stationarity (drum roll) they are all the same (=they have same distribution). (identical)

Answer 34

(outcome on left-hand side) (predictors on right-hand side).

Answer 35

Cross-sectional data can be represented in a spread sheet format where each row represents an observed unit and each column describes a unit characteristic.

Answer 36

contains more columns and is therefore wider than the table in Figure 1 Therefore, the representation of the data in Figure 2 is called the “wide” and Figure 1 is called the long model.

Answer 37

Suppose that At does not change over time. In that case, we can write At = A. Such an A describes the total effect of unobserved unit characteristics that do not change over time.

Answer 38

fr =(fr1982 + fr1988)/2, (average year) tax =(tax1982 + tax1988)/2, (average tax) U ̄ =(U1982 + U1988)/2. (Average U) fr = β1tax + A + U subtract averages. from the original model ? gives us frt = β1taxt + Ut t = 1982, 1988. where the fixed effect is removed and B1 is preserved

Answer 39

Computing standard errors under the assumption that certain blocks of observations exhibit correlation is called “computing standard errors with clustering”. The blocks of correlated observations are called clusters. For panel data, it is often sensible to assume that all the observations of one unit form a cluster.

Answer 40

Bias: good estimation with more Xs Variance: good estimation with less Xs

Answer 41

-Does not measure EPE -Estimated idiosyncratic + bias

Answer 42

-measures EPE

Answer 43

Random sampling identical & independent imposed by sampling design

Answer 44

stationary and weakly time dependence property of economic environment difficult to verify empirically often fail

Answer 45

delta all variables

Week 3 and Week 4 Flashcards

(73 cards)