QE Flashcards

1
Q

Mean independent:

Independent:

A

E(u|X) = E(u) and E(u) = 0

Eg(u)|X) = E(g(u))

E.g. variance spreading as X increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Assumptions for consistency vs unbiasedness

A

Consistency: OR cov(e,X) = 0
Unbiasedness: mean independence
- E(e|X) = E(e) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regressions in both directions implications

A

Run regression both directions

  • Both coefficients have descriptive interpretations
  • Only one coefficient can have causal
  • in general B does not equal 1/y
  • Inverting a LRM (or CEF) does not yield a LRM (or CEF)
  • to persuade causal need to persuade OR is plausible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define descriptive interpretation

A

“on average a unit increase in X1 is associated with a b* increase in Y, holding X2..Xk constant”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standard error of regression

A

s = square root (SSR/(n-k-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Least squares assumptions

A

1) error term is conditional mean 0
2) X Y are iid draws from joint dis
3) non-finite fourth moments - large outliers are unlikely
4) no perfect multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Talk about consistency

A

Means sample β is very close to u with high probability

  • consequence of LLN
  • distribution of sample β collapse to β
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Talk about asymptotic normality

A
  • Consequence of CLT

- β - β => N(0, σ^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Talk about asymptotic variance of beta / se(B hat)

A

w^2 = σu^2 / Var (X) (write as sum)

  • OLS is more precise the larger Var (X) and the smaller σu^2 (better fit - can get by adding more regressors)
  • valid for IV: good control high variance in Var(X) explained by X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Talk about imperfect multicollinearity

A
  • high correlation of X with other regressors so Var(X ̅ ) is very small
  • beta measured imprecisely
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hypothesis testing steps

A

1) state null and alternative
2) get t stat
3) Under the null t -> N(0,1)
4) Decision rule
5) Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Talk about one-sided tests

A
  • only makes sense if there’s an a priori reason (e.g. from eco theory) to excluding other direction from consideration
  • has more power to detect departures from the null in positive direction but not power to detect departures in the negative direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pval definiton and usefulness

A
  • prob, under the null, of obtaining a value of t at least as averse to the null as the one computed.
  • summarising the weight of evidence against the null
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Confidence interval interpretation

A

The collection of null hypothesised values for β that would be accepted (by a 2-sided t test) at significance level ∝
- set of null hypothesis that i couldn’t reject if I do a 1% confidence test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Polynomial in regression vs linear

A
  • polynominal: can look at marginal effect of X on Y - differentiate
  • Linear: averages out ease different marginal effects
  • coefficient on x1x2 is effect of a one-unit inc in x1 or x2 above and beyond a unit inc in each of them alone
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Causes of endogeneity

A

1) omitted variable bias
2) measurement error
3) simultaneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

OBV formula and usefulness

A

β’ = β + yCov(X1, X2) / Var (x1)

  • assess likely direction of the bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Impact of measurement in error in Y

A
  • inferences on β still valid, just estimate of β is less precise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Example for IV for demand elasticity of cigarettes (why a good one)

A

General sales tax:

  • cov(t,p) not equal zero
  • cov(t,u) = 0 (assume not state specific)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Solutions to bad controls

A

1) Find an instrument for education and estimate model via 2SLS
2) omit from regression
- Interpretation: ‘total effect’ of labour market discrimination inclusive of its effects of educational attainment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why 2SLS less efficient than OLS

A
  • Var(X) = Var(X* + u) = Var(X) + Var(u) > Var (X)

- only looking at part of X explained by D so less precise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Tension in choosing instruments

A
  • Want to be highly correlated with X: inc Var(X*)

- requiring variables to be exogenous Cov(Z,u) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Test for relevance

A

F > c = 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Test for exogeneity

- Descriptive exogeneity

A

Can’t test for one Z
- Test for more than one Z:
H0: cov(z1, u) =..= cov(z2,u) = 0
F test F-> Fm-1, infinity

Z correlated with other unobserved determinants of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Stationary definition and meaning

- strict and weak

A

1) E(Yt) = u for all t,
2) Var(yt) = σ^2 < infinity for all t
3) Cov(Yt, Yt-j) depends on j but not t
- Def: if its probability distribution does not change over time, joint distribution doesn’t depend on s.
- Says: past is like the present and the future, at least in a probabilistic sense

Weak: 1st and 2nd moments exist and are time invariant

Meaning: models can be used outside the range of data with which they were estimated

  • |β| < 1 var(ut) = σ^2
  • Y0 is random variable with E(Y0) = β0/1-β1 and Var(Y0) = σ^2 / 1-β1^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What I(0) means

A
  • process is stationary OR trend-stationary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Issue with ADF test

A

Has notoriously little power in distinguishing between unit roots and very persistent but stationary alternatives i.e. when β1 = 0.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Chow test

A

Testing for a break at period T:

- create a dummy variable 0 t>T, 1 t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Test break without known T

A

QLR ratio:

  • Treat t as unknown parameter and estimate alongside the regression coefficients
  • F(t): chow F stat for a break at t
  • F(t0): 15th percentile of T F(t1): 85th percentile
  • QLR stat is the largest Chow F statistic one can get across all candidate break dates
  • can’t look too close to start and end
  • critical value much larger than F
  • QLR doesn’t tell us exactly how equation changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Requirements for efficiency

A

Variance smaller AND unbiased (show)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Spurious regression definiton

A

Both I(1) (stochastic and independent) and not cointegrated
- Stochastic is… Y = Yt-1 + ut Xt = Xt-1 + e
Pr ( t) > 1.96 is high - of significant result -> appear related
- misleading inference even in large samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Examples of spurious regression

A

FX rates

Stock market and consumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Cointegration steps

A

Engle-Granger ADF test:

1) Check via ADF that both I(1)
2) Estimate θ via regression y on x
- OLS regression of Y on X yields a consistent estimator for θ
3) Store residuals and test them via ADR where H0: random walk H1: stationary
- if reject H0 then by definition cointegrated
- use different critical values to account for sampling uncertainty in estimating θ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Define cointegration

A

Xt, Yt ~ I(1)

Yt - a - b Xt ~ I(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Problems when β = 1

A

1) When sample large OLS estimator is biased towards zero: = 1-5.3/T
2) Distribution of t-stat and β^ not normal even in large samples (use ADF values)
3) Spurious regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Difference between residual and forecast error

A

Residual: in sample

Forecast error: out of sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Difference between forecast and confidence interval

A

Yt+1 is random, not a non-random coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Definiton of causal effect

A

Coefficients measure the causal effect of ceteris paribus exogenous change in the explanatory variable on the dependent variable Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Definiton of conditional independence and what it means

A

D || {Y(1), Y(0)} | X –> E(Y(0)|D,X) = E(Y(0)|X)

Means: E(Y(0)|D = 1, X = x) = E(Y(0)|D = 0, X = x)

Says that variables are related but only through X
- conditioning restores independence

to control for all the non-random variation in assignment such that what variation is left over is plausibly independent of potential outcomes - clean out selection bias

40
Q

Meaning of LLN and implications

A

X- is near u with high probability when n is large

lim{Var(X-)} = 0

Implications: establishes the conditions where estimator is consistent

41
Q

Type 1 error definition and example when worried about it

A

P(reject H0| H0 true)

When costly to administer

42
Q

Type 2 error definition, definition of power and example when worried about it

A

P(accept H0| H1 true) = β

  • Power = 1 - β = P(reject H0| H1 true)
  • Power: ability to detect a violation of the null

Could save someone’s life

43
Q

CLT requirements, implications

A

Need iid, E(X^2) < infinity

  • √n (X ̅- μ)→N(0,σ^2) as n -> ∞
  • distributions of β^ are approx normal when n is large
  • “from CLT x is approx N(u, sd^2 / n )
44
Q

Binomial E, Var, se

A

E= np
Var = np(1-p)
se √p(1-p)/n

45
Q

Jensen’s inequality if g(x) concave

A

E(g(x)) < g(E(x))

46
Q

Cov / Corr formula

A

Corr(x,y) = Cov(x,y) / √var(x)var(y) = p

47
Q

F test

A

Takes into account the joint significance of the estimators

  • (whether they significantly reduce the unexplained variation in the data compared to not using them)
  • Use information criteria (AIC)
48
Q

What it means if D independent of potential outcomes

A

1) probability of assignment to treatment doesn’t vary with potential outcomes
2) distribution of potential outcomes doesn’t vary with treatment status

  • leads to mean independence of the potential outcomes wrt treatment
49
Q

Internal validity defintion

  • key
  • examples
A

Inferences about causal effects are valid for the population being studied

  • Key: plausible that error is OR
  • Contamination (control access treatment)
  • Non-compliance (treated -> untreated)
  • Placebo (final outcomes changed as perceived changes)
  • Hawthorne (change behaviour - imperceptible changes)
  • individualistic treatment response: no interaction effects between subjects: outcome doesn’t depend on whether other’s get treatment
50
Q

External validity defintion

  • key
  • examples
A

whether a study’s findings can be generalised to other populations and settings

  • Key: population differs in a way that alters the causal effect of interest which is not accounted for by the model (not captured in the X’s) - do lots of RCT and see which factors impact outcome
  • individualistic treatment response (no spillovers)
  • long vs short outcomes (surrogate outcomes e.g. class size on education vs long-term employment)
  • supply-side administering
  • ‘income’ elasticity: perceive income transfer and other sources of income the same
51
Q

ITT defintion

A

The average causal eject of a program or policy that is introduced to a group of individuals without knowing whether these individuals participate

52
Q

LATE defintion

A

The average causal effect of treatment delivery on the outcome of interest, along compliers

53
Q

LATE assumptions and their use

A

1) Independence (good as randomly assigned)
- can measure causal of Z (assignment) on Y (outcome) and X (delivery)
2) Relevance
- compute ratio as demon not 0
3) Exclusion (Z only impact Y via D)
- Yi(d, 0) = Yi(d,1) for d = 0, 1
- AT NT don’t change when instrument gets switched on/ off so removed
4) Monotonicity (impact one direction)
- excludes defiers

54
Q

Relationship between ATT and LATE

A

ATT = yATAT+ (1-y) LATE

55
Q

LATE and ITT relationship

A

LATE > ITT as with ITT the treatment effect gets diluted among non-treated individuals

56
Q

Allowing for heterogenous treatment effects

A

Y = a + βX + dD + yDX + u
Ho: y =0

  • OLS is a consistent estimator of the average causal effect if randomly assigned
  • IV: weighted average of the individual treatment effect, most influential = greatest weight
57
Q

Problem with return to schooling

A

Assignment to schooling is not random. Need to use CIA to identify causal effect - add regressors which account for the non-random assignment of schooling

58
Q

Issue with twins, internal and external

A

Internal:

  • measurement error is exacerbated when taking differences
  • fewer observations: reduces variation in X
    - impact on coefficient standard error
  • differences in ability develop: epigenetics
  • parent behaviour / investment to twins differ across families

External:

  • unsure if can extrapolate to whole population: majority due to IVF - selection bias - not random
  • worse health as have to compete for resources
59
Q

AR(p) model

A

Autoregressive model

  • Use f test to test hypothesis that Yt-2,…,Yt-p do not further help forecast Yt beyond Yt-1
  • p: the highest number of lags that’s relevant
60
Q

What is the Autoregressive distributed lag model and test for it

A
  • Includes lags of X as well as Y
  • Granger-Causality test (F-stat) - test the joint hypothesis that none of the X’s is a useful predictor above and beyond lagged values of Y. Causality here refers to the marginal predictive content
  • Test: E(Yt|Yt-i, Xt-i) = E(Yt|Yt-i)
61
Q

Defintion of a trend

A

A persistent long-term movement of a variable over time

62
Q

Two trends

A
Stochastic: 
- Yt = Yt-1 + ut
- Δ Yt = ut 
- called I(1) 
Deterministic: a x t
63
Q

Random walk with a drift equation

-

A

Both deterministic and stochastic trends

  • Yt = a1 + Yt-1 + ut
  • Assuming Y0 = 0 then Yt = a1 x t + sum of us = DT + ST
  • E(y) = at Var(y) = σ^2t Cov(yt, yt-s) = σ^2(t-s)
64
Q

Removing trends

A

Deterministic:
- regress Yt on function of time and take residuals e.g. linear detrending: Y~ = Y - a0,ols - a1,olst
Stochastic:
- differencing

65
Q

Related I(2) I(1) I(0) using inflation

Relate AR(1) AR(2) using inflation as example.

Why use Δinflation

A

logCPI is I(2), inf I(1), Δinf is I(0)

AR(2): inf AR(1): Δinflation
- first difference much less serially correlated so use to keep in a stationary framework we understand. If strongly serially correlated then AR coefficient is biased towards zero

66
Q

Test for a unit root equation and H0 H1

A

Dicky-Fuller test

ΔYt = β0 + δYt-1 + ut

Ho: δ = 0 stochastic (unit root)
H1: δ < 0 stationary

Don’t use normal critical values

67
Q

Distributed lag model aim and assumptions

A

To measure dynamic causal effect
- find cumulative dynamic multipliers

1) X is exogenous (E(u|Xt..Xt-s) = 0
2) Y and X have stationary distributions and become independent as j gets large
3) X and Y have no-zero finite 8th moments
4) no perfect multicollinearity

68
Q

Things to remember with distributed lag model for causal effects

A
  • OLS yields consistent estimators for β
  • sampling distribution of β is normal
  • formula for variance is not the usual as ut may be correlated
  • need to use HAC standard errors
69
Q

The problem to which HAC is the solution

A

ut being serially correlated

70
Q

What mean by “significant at the 1% level”, “t-value”

A
Significant
- implicitly assume two-sided test
- reject with 99% confidence
- lower a = larger c = less powerful  (smaller type 1 error)
T-value
- testing the hypothesis that beta = 0
71
Q

Why use IV

A

Instrument is used to isolate the movements in X that are uncorrelated with the error term (first stage), thereby allowing consistent estimation (2nd stage)

72
Q

Issues with IV when not everyone is affected by the instruments

A

Discuss compliers, AT, NT, defiers

- LATE: instrument is binary (compliers, noncompliers)

73
Q

What is selection bias

- dealing with it

A

The bias in an estimator of a regression coefficient that arises when a selection process influences the availability of data and that process is related to the dependent variable.

  • results in cov(u, x) not 0
  • violates independence assumption
  • not like-for-like
  • can use conditional independence
74
Q

What is RMSFE? Assumption for it?

A

A measure of the spread of the forecast error distribution - magnitude of a typical forecast mistake.

  • impose normal distribution rather than take for granted - no CLT for forecasting interval as it could be non-stationary
  • Yt is a random variable not a parameter
75
Q

Talk about AIC

A
  • used for model selection as provides ranking
  • trades of goodness of fit and simplicity
  • compare different models for the same data set
  • the one with lowest value for AIC is best quality
  • relative measure of model fit
  • penalises overfitting
76
Q

Approximating CEF with LRM

A
  • limiting/ inaccurate if curved
    • doesn’t have to be limited to single regressor can include polynomials to have better fit
  • no certainty CEF is continuous: LRM could never give correct value with discrete function
  • ‘best’ as minimised the squared error
  • in general variance of prediction error or the CEF is lower than that of LRM
77
Q

“As good as randomly assigned”

A
  • Cov(X, u) = 0
78
Q

Explain 2SLS

A

First stage: used to create the ‘generated ‘instrument’ or ‘adjustment treatment variable’

79
Q

Adding another regressor

A

For

  • exploit conditional independence assumption to remove omitted variable bias. E.g. if corr with dummy so differ between men and women
  • inc statistical precision since reduce error variance but not biased coefficient

Against:
- if X is endogenous then same problem - compositional bias

80
Q

Simultaneity bias

A
  • when two variables are jointly determined and then used in a regression
  • e.g. p and q of any good
81
Q

Example of compositional bias

A

Adding occupation for wage differences in men and women, if women don’t have the same opportunities then still discrimination
- If Z is dependent on X then endogenous

82
Q

List potential outcomes

When no AT NT?

A

Compliers Di(1) = 1 Di(0) = 0
AT: Di(1) = 1 Di(0) = 1
NT: Di(1) = 0 Di(0) = 0
Defiers: Di(1) = 0 Di(0) = 1

  • eligibility for treatment can be controlled
  • only those assigned can receive it
83
Q

Definition of a break and its problems

A

A change in probability distribution of the data
- coefficient in model are not stable over the full sample / time

Problems:

  • destroy external validity
  • more important cause of forecast failure
  • in-sample estimates of coefficients to be biased
  • OLS estimates “average value” which does not correspond to the true causal effect at any period
  • can be difficult to distinguish between multiple breaks and stochastic trends -> break mistaken for a RW (graph)
84
Q

If perfect compliance

- No always takers

A

Z = D so LATE denominator:
- E(D|Z=1) - E(D|Z = 0) = E(D|D=1) - E(D|D = 0) =
1 - 0
- LATE = ATE

  • LATE = ATT requires LATE assumptions plus P(D= 1|Z=0) = 0 i.e. no AT
85
Q

Why adjusted r squared

A

Occam’s Razor: best model one which fits best with the fewest regressors

  • with normal, adding additional inc R even if neglibale explanatory power
  • (n-1)/(n-k-1) > 1 -> penalises for adding another
86
Q

Talk about augmented DF test

A
  • use enough lags so that the residuals are serially uncorrelated
  • more lags = smaller sample = lose degrees of freedom = inc standard error
  • don’t use too many lags
87
Q

Adding a trend to DF test

A
  • if trend and don’t include: biased in favour of finding a unit root. High type 2 error.
  • inc critical values as y makes distribution of t-stat more dispersed / skewed
88
Q

Determining granger causality when have non-stationary

A

Use an ECM if cointegrated
- subdivided into SR and LR causality
Difference: if cointegrated doesn’t provide LR

89
Q

Marginal distribution

A

Sum the joint distributions

- P(rain) = P(r and l) + P(r and s)

90
Q

Estimate vs estimator

A

Estimator: function of a sample of data to be drawn randomly from a population - it is a random variable

Estimate is the numerical value of the estimator when a specifi􏰔c sample is drawn; it is a nonrandom number

91
Q

What is correlation

A

Measure of strength of the linear association between X and Y. Lies between -1 and 1

92
Q

t stat

A

How far beta hat is from null, relative to se(b)

93
Q

QoB

A

Angrist and Krueger (1991)

  • Exclusion: does schooling age matter by itself
  • Find tend to do better in schooling as more schooling and higher earnings
  • if anything biased downwards
  • find doesn’t directly impact earnings (exclusion)
  • compliers: the group for which the instrument changes their decision
94
Q

FX

A

PPP UIP LOOP suggests FX cannot have a unit root

- cannot deviate permeanetly from from the ratio of prices in two countries

95
Q

With spurious, is differencing valid?

A

Yes if assume ut in difference regression is white noise -> not correlated overtime

96
Q

FWL theorem

A

The variation in X that cannot be explained by a linear combination of the other regressors

  • isolates the effect of Xk
  • coefficient reflects the individual predictive contribution of each individual regressor alone