QE Flashcards

Question 1

Q

Mean independent:

Independent:

Answer

A

E(u|X) = E(u) and E(u) = 0

Eg(u)|X) = E(g(u))

E.g. variance spreading as X increases

Question 2

Q

Assumptions for consistency vs unbiasedness

Answer

A

Consistency: OR cov(e,X) = 0
Unbiasedness: mean independence
- E(e|X) = E(e) = 0

Question 3

Q

Regressions in both directions implications

Answer

A

Run regression both directions

Both coefficients have descriptive interpretations
Only one coefficient can have causal
in general B does not equal 1/y
Inverting a LRM (or CEF) does not yield a LRM (or CEF)
to persuade causal need to persuade OR is plausible

Question 4

Q

Define descriptive interpretation

Answer

A

“on average a unit increase in X1 is associated with a b* increase in Y, holding X2..Xk constant”

Question 5

Q

Standard error of regression

Answer

A

s = square root (SSR/(n-k-1)

Question 6

Q

Least squares assumptions

Answer

A

1) error term is conditional mean 0
2) X Y are iid draws from joint dis
3) non-finite fourth moments - large outliers are unlikely
4) no perfect multicollinearity

Question 7

Q

Talk about consistency

Answer

A

Means sample β is very close to u with high probability

consequence of LLN
distribution of sample β collapse to β

Question 8

Q

Talk about asymptotic normality

Answer

A

Consequence of CLT

- β - β => N(0, σ^2)

Question 9

Q

Talk about asymptotic variance of beta / se(B hat)

Answer

A

w^2 = σu^2 / Var (X) (write as sum)

OLS is more precise the larger Var (X) and the smaller σu^2 (better fit - can get by adding more regressors)
valid for IV: good control high variance in Var(X) explained by X

Question 10

Q

Talk about imperfect multicollinearity

Answer

A

high correlation of X with other regressors so Var(X ̅ ) is very small
beta measured imprecisely

Question 11

Q

Hypothesis testing steps

Answer

A

1) state null and alternative
2) get t stat
3) Under the null t -> N(0,1)
4) Decision rule
5) Outcome

Question 12

Q

Talk about one-sided tests

Answer

A

only makes sense if there’s an a priori reason (e.g. from eco theory) to excluding other direction from consideration
has more power to detect departures from the null in positive direction but not power to detect departures in the negative direction

Question 13

Q

Pval definiton and usefulness

Answer

A

prob, under the null, of obtaining a value of t at least as averse to the null as the one computed.
summarising the weight of evidence against the null

Question 14

Q

Confidence interval interpretation

Answer

A

The collection of null hypothesised values for β that would be accepted (by a 2-sided t test) at significance level ∝
- set of null hypothesis that i couldn’t reject if I do a 1% confidence test

Question 15

Q

Polynomial in regression vs linear

Answer

A

polynominal: can look at marginal effect of X on Y - differentiate
Linear: averages out ease different marginal effects
coefficient on x1x2 is effect of a one-unit inc in x1 or x2 above and beyond a unit inc in each of them alone

Question 16

Q

Causes of endogeneity

Answer

A

1) omitted variable bias
2) measurement error
3) simultaneity

Question 17

Q

OBV formula and usefulness

Answer

A

β’ = β + yCov(X1, X2) / Var (x1)

assess likely direction of the bias

Question 18

Q

Impact of measurement in error in Y

Answer

A

inferences on β still valid, just estimate of β is less precise

Question 19

Q

Example for IV for demand elasticity of cigarettes (why a good one)

Answer

A

General sales tax:

cov(t,p) not equal zero
cov(t,u) = 0 (assume not state specific)

Question 20

Q

Solutions to bad controls

Answer

A

1) Find an instrument for education and estimate model via 2SLS
2) omit from regression
- Interpretation: ‘total effect’ of labour market discrimination inclusive of its effects of educational attainment

Question 21

Q

Why 2SLS less efficient than OLS

Answer

A

Var(X) = Var(X* + u) = Var(X) + Var(u) > Var (X)

- only looking at part of X explained by D so less precise

Question 22

Q

Tension in choosing instruments

Answer

A

Want to be highly correlated with X: inc Var(X*)

- requiring variables to be exogenous Cov(Z,u) = 0

Question 23

Q

Test for relevance

Answer

A

F > c = 10

Question 24

Q

Test for exogeneity

- Descriptive exogeneity

Answer

A

Can’t test for one Z
- Test for more than one Z:
H0: cov(z1, u) =..= cov(z2,u) = 0
F test F-> Fm-1, infinity

Z correlated with other unobserved determinants of Y

Question 25

Q

Stationary definition and meaning

- strict and weak

Answer

A

1) E(Yt) = u for all t,
2) Var(yt) = σ^2 < infinity for all t
3) Cov(Yt, Yt-j) depends on j but not t
- Def: if its probability distribution does not change over time, joint distribution doesn’t depend on s.
- Says: past is like the present and the future, at least in a probabilistic sense

Weak: 1st and 2nd moments exist and are time invariant

Meaning: models can be used outside the range of data with which they were estimated

|β| < 1 var(ut) = σ^2
Y0 is random variable with E(Y0) = β0/1-β1 and Var(Y0) = σ^2 / 1-β1^2

Question 26

Q

What I(0) means

Answer

A

process is stationary OR trend-stationary

Question 27

Q

Issue with ADF test

Answer

A

Has notoriously little power in distinguishing between unit roots and very persistent but stationary alternatives i.e. when β1 = 0.9

Question 28

Q

Chow test

Answer

A

Testing for a break at period T:

- create a dummy variable 0 t>T, 1 t

Question 29

Q

Test break without known T

Answer

A

QLR ratio:

Treat t as unknown parameter and estimate alongside the regression coefficients
F(t): chow F stat for a break at t
F(t0): 15th percentile of T F(t1): 85th percentile
QLR stat is the largest Chow F statistic one can get across all candidate break dates
can’t look too close to start and end
critical value much larger than F
QLR doesn’t tell us exactly how equation changes

Question 30

Q

Requirements for efficiency

Answer

A

Variance smaller AND unbiased (show)

Question 31

Q

Spurious regression definiton

Answer

A

Both I(1) (stochastic and independent) and not cointegrated
- Stochastic is… Y = Yt-1 + ut Xt = Xt-1 + e
Pr ( t) > 1.96 is high - of significant result -> appear related
- misleading inference even in large samples

Question 32

Q

Examples of spurious regression

Answer

A

FX rates

Stock market and consumption

Question 33

Q

Cointegration steps

Answer

A

Engle-Granger ADF test:

1) Check via ADF that both I(1)
2) Estimate θ via regression y on x
- OLS regression of Y on X yields a consistent estimator for θ
3) Store residuals and test them via ADR where H0: random walk H1: stationary
- if reject H0 then by definition cointegrated
- use different critical values to account for sampling uncertainty in estimating θ

Question 34

Q

Define cointegration

Answer

A

Xt, Yt ~ I(1)

Yt - a - b Xt ~ I(0)

Question 35

Q

Problems when β = 1

Answer

A

1) When sample large OLS estimator is biased towards zero: = 1-5.3/T
2) Distribution of t-stat and β^ not normal even in large samples (use ADF values)
3) Spurious regression

Question 36

Q

Difference between residual and forecast error

Answer

A

Residual: in sample

Forecast error: out of sample

Question 37

Q

Difference between forecast and confidence interval

Answer

A

Yt+1 is random, not a non-random coefficient

Question 38

Q

Definiton of causal effect

Answer

A

Coefficients measure the causal effect of ceteris paribus exogenous change in the explanatory variable on the dependent variable Y

Question 39

Q

Definiton of conditional independence and what it means

Answer

A

D || {Y(1), Y(0)} | X –> E(Y(0)|D,X) = E(Y(0)|X)

Means: E(Y(0)|D = 1, X = x) = E(Y(0)|D = 0, X = x)

Says that variables are related but only through X
- conditioning restores independence

to control for all the non-random variation in assignment such that what variation is left over is plausibly independent of potential outcomes - clean out selection bias

Question 40

Q

Meaning of LLN and implications

Answer

A

X- is near u with high probability when n is large

lim{Var(X-)} = 0

Implications: establishes the conditions where estimator is consistent

Question 41

Q

Type 1 error definition and example when worried about it

Answer

A

P(reject H0| H0 true)

When costly to administer

Question 42

Q

Type 2 error definition, definition of power and example when worried about it

Answer

A

P(accept H0| H1 true) = β

Power = 1 - β = P(reject H0| H1 true)
Power: ability to detect a violation of the null

Could save someone’s life

Question 43

Q

CLT requirements, implications

Answer

A

Need iid, E(X^2) < infinity

√n (X ̅- μ)→N(0,σ^2) as n -> ∞
distributions of β^ are approx normal when n is large
“from CLT x is approx N(u, sd^2 / n )

Question 44

Q

Binomial E, Var, se

Answer

A

E= np
Var = np(1-p)
se √p(1-p)/n

Question 45

Q

Jensen’s inequality if g(x) concave

Answer

A

E(g(x)) < g(E(x))

Question 46

Q

Cov / Corr formula

Answer

A

Corr(x,y) = Cov(x,y) / √var(x)var(y) = p

Question 47

Q

F test

Answer

A

Takes into account the joint significance of the estimators

(whether they significantly reduce the unexplained variation in the data compared to not using them)
Use information criteria (AIC)

Question 48

Q

What it means if D independent of potential outcomes

Answer

A

1) probability of assignment to treatment doesn’t vary with potential outcomes
2) distribution of potential outcomes doesn’t vary with treatment status

leads to mean independence of the potential outcomes wrt treatment

Question 49

Q

Internal validity defintion

key
examples

Answer

A

Inferences about causal effects are valid for the population being studied

Key: plausible that error is OR
Contamination (control access treatment)
Non-compliance (treated -> untreated)
Placebo (final outcomes changed as perceived changes)
Hawthorne (change behaviour - imperceptible changes)
individualistic treatment response: no interaction effects between subjects: outcome doesn’t depend on whether other’s get treatment

Question 50

Q

External validity defintion

key
examples

Answer

A

whether a study’s findings can be generalised to other populations and settings

Key: population differs in a way that alters the causal effect of interest which is not accounted for by the model (not captured in the X’s) - do lots of RCT and see which factors impact outcome
individualistic treatment response (no spillovers)
long vs short outcomes (surrogate outcomes e.g. class size on education vs long-term employment)
supply-side administering
‘income’ elasticity: perceive income transfer and other sources of income the same

Question 51

Q

ITT defintion

Answer

A

The average causal eject of a program or policy that is introduced to a group of individuals without knowing whether these individuals participate

Question 52

Q

LATE defintion

Answer

A

The average causal effect of treatment delivery on the outcome of interest, along compliers

Question 53

Q

LATE assumptions and their use

Answer

A

1) Independence (good as randomly assigned)
- can measure causal of Z (assignment) on Y (outcome) and X (delivery)
2) Relevance
- compute ratio as demon not 0
3) Exclusion (Z only impact Y via D)
- Yi(d, 0) = Yi(d,1) for d = 0, 1
- AT NT don’t change when instrument gets switched on/ off so removed
4) Monotonicity (impact one direction)
- excludes defiers

Question 54

Q

Relationship between ATT and LATE

Answer

A

ATT = yATAT+ (1-y) LATE

Question 55

Q

LATE and ITT relationship

Answer

A

LATE > ITT as with ITT the treatment effect gets diluted among non-treated individuals

Question 56

Q

Allowing for heterogenous treatment effects

Answer

A

Y = a + βX + dD + yDX + u
Ho: y =0

OLS is a consistent estimator of the average causal effect if randomly assigned
IV: weighted average of the individual treatment effect, most influential = greatest weight

Question 57

Q

Problem with return to schooling

Answer

A

Assignment to schooling is not random. Need to use CIA to identify causal effect - add regressors which account for the non-random assignment of schooling

Question 58

Q

Issue with twins, internal and external

Answer

A

Internal:

measurement error is exacerbated when taking differences
fewer observations: reduces variation in X
- impact on coefficient standard error
differences in ability develop: epigenetics
parent behaviour / investment to twins differ across families

External:

unsure if can extrapolate to whole population: majority due to IVF - selection bias - not random
worse health as have to compete for resources

Question 59

Q

AR(p) model

Answer

A

Autoregressive model

Use f test to test hypothesis that Yt-2,…,Yt-p do not further help forecast Yt beyond Yt-1
p: the highest number of lags that’s relevant

Question 60

Q

What is the Autoregressive distributed lag model and test for it

Answer

A

Includes lags of X as well as Y
Granger-Causality test (F-stat) - test the joint hypothesis that none of the X’s is a useful predictor above and beyond lagged values of Y. Causality here refers to the marginal predictive content
Test: E(Yt|Yt-i, Xt-i) = E(Yt|Yt-i)

Question 61

Q

Defintion of a trend

Answer

A

A persistent long-term movement of a variable over time

Question 62

Q

Two trends

Answer

A

Stochastic: 
- Yt = Yt-1 + ut
- Δ Yt = ut 
- called I(1) 
Deterministic: a x t

Question 63

Q

Random walk with a drift equation

-

Answer

A

Both deterministic and stochastic trends

Yt = a1 + Yt-1 + ut
Assuming Y0 = 0 then Yt = a1 x t + sum of us = DT + ST
E(y) = at Var(y) = σ^2t Cov(yt, yt-s) = σ^2(t-s)

Question 64

Q

Removing trends

Answer

A

Deterministic:
- regress Yt on function of time and take residuals e.g. linear detrending: Y~ = Y - a0,ols - a1,olst
Stochastic:
- differencing

Answer 65

A

logCPI is I(2), inf I(1), Δinf is I(0)

AR(2): inf AR(1): Δinflation
- first difference much less serially correlated so use to keep in a stationary framework we understand. If strongly serially correlated then AR coefficient is biased towards zero

Answer 66

A

Dicky-Fuller test

ΔYt = β0 + δYt-1 + ut

Ho: δ = 0 stochastic (unit root)
H1: δ < 0 stationary

Don’t use normal critical values

Answer 67

A

To measure dynamic causal effect
- find cumulative dynamic multipliers

1) X is exogenous (E(u|Xt..Xt-s) = 0
2) Y and X have stationary distributions and become independent as j gets large
3) X and Y have no-zero finite 8th moments
4) no perfect multicollinearity

Answer 68

A

OLS yields consistent estimators for β
sampling distribution of β is normal
formula for variance is not the usual as ut may be correlated
need to use HAC standard errors

Answer 69

A

ut being serially correlated

Answer 70

A

Significant
- implicitly assume two-sided test
- reject with 99% confidence
- lower a = larger c = less powerful  (smaller type 1 error)
T-value
- testing the hypothesis that beta = 0

Answer 71

A

Instrument is used to isolate the movements in X that are uncorrelated with the error term (first stage), thereby allowing consistent estimation (2nd stage)

Answer 72

A

Discuss compliers, AT, NT, defiers

- LATE: instrument is binary (compliers, noncompliers)

Answer 73

A

The bias in an estimator of a regression coefficient that arises when a selection process influences the availability of data and that process is related to the dependent variable.

results in cov(u, x) not 0
violates independence assumption
not like-for-like
can use conditional independence

Answer 74

A

A measure of the spread of the forecast error distribution - magnitude of a typical forecast mistake.

impose normal distribution rather than take for granted - no CLT for forecasting interval as it could be non-stationary
Yt is a random variable not a parameter

Answer 75

A

used for model selection as provides ranking
trades of goodness of fit and simplicity
compare different models for the same data set
the one with lowest value for AIC is best quality
relative measure of model fit
penalises overfitting

Answer 76

A

limiting/ inaccurate if curved
- doesn’t have to be limited to single regressor can include polynomials to have better fit
no certainty CEF is continuous: LRM could never give correct value with discrete function
‘best’ as minimised the squared error
in general variance of prediction error or the CEF is lower than that of LRM

Answer 77

A

Cov(X, u) = 0

Answer 78

A

First stage: used to create the ‘generated ‘instrument’ or ‘adjustment treatment variable’

Answer 79

A

For

exploit conditional independence assumption to remove omitted variable bias. E.g. if corr with dummy so differ between men and women
inc statistical precision since reduce error variance but not biased coefficient

Against:
- if X is endogenous then same problem - compositional bias

Answer 80

A

when two variables are jointly determined and then used in a regression
e.g. p and q of any good

Answer 81

A

Adding occupation for wage differences in men and women, if women don’t have the same opportunities then still discrimination
- If Z is dependent on X then endogenous

Answer 82

A

Compliers Di(1) = 1 Di(0) = 0
AT: Di(1) = 1 Di(0) = 1
NT: Di(1) = 0 Di(0) = 0
Defiers: Di(1) = 0 Di(0) = 1

eligibility for treatment can be controlled
only those assigned can receive it

Answer 83

A

A change in probability distribution of the data
- coefficient in model are not stable over the full sample / time

Problems:

destroy external validity
more important cause of forecast failure
in-sample estimates of coefficients to be biased
OLS estimates “average value” which does not correspond to the true causal effect at any period
can be difficult to distinguish between multiple breaks and stochastic trends -> break mistaken for a RW (graph)

Answer 84

A

Z = D so LATE denominator:
- E(D|Z=1) - E(D|Z = 0) = E(D|D=1) - E(D|D = 0) =
1 - 0
- LATE = ATE

LATE = ATT requires LATE assumptions plus P(D= 1|Z=0) = 0 i.e. no AT

Answer 85

A

Occam’s Razor: best model one which fits best with the fewest regressors

with normal, adding additional inc R even if neglibale explanatory power
(n-1)/(n-k-1) > 1 -> penalises for adding another

Answer 86

A

use enough lags so that the residuals are serially uncorrelated
more lags = smaller sample = lose degrees of freedom = inc standard error
don’t use too many lags

Answer 87

A

if trend and don’t include: biased in favour of finding a unit root. High type 2 error.
inc critical values as y makes distribution of t-stat more dispersed / skewed

Answer 88

A

Use an ECM if cointegrated
- subdivided into SR and LR causality
Difference: if cointegrated doesn’t provide LR

Answer 89

A

Sum the joint distributions

- P(rain) = P(r and l) + P(r and s)

Answer 90

A

Estimator: function of a sample of data to be drawn randomly from a population - it is a random variable

Estimate is the numerical value of the estimator when a specifi􏰔c sample is drawn; it is a nonrandom number

Answer 91

A

Measure of strength of the linear association between X and Y. Lies between -1 and 1

Answer 92

A

How far beta hat is from null, relative to se(b)

Answer 93

A

Angrist and Krueger (1991)

Exclusion: does schooling age matter by itself
Find tend to do better in schooling as more schooling and higher earnings
if anything biased downwards
find doesn’t directly impact earnings (exclusion)
compliers: the group for which the instrument changes their decision

Answer 94

A

PPP UIP LOOP suggests FX cannot have a unit root

- cannot deviate permeanetly from from the ratio of prices in two countries

Answer 95

A

Yes if assume ut in difference regression is white noise -> not correlated overtime

Answer 96

A

The variation in X that cannot be explained by a linear combination of the other regressors

isolates the effect of Xk
coefficient reflects the individual predictive contribution of each individual regressor alone