QE Flashcards
Mean independent:
Independent:
E(u|X) = E(u) and E(u) = 0
Eg(u)|X) = E(g(u))
E.g. variance spreading as X increases
Assumptions for consistency vs unbiasedness
Consistency: OR cov(e,X) = 0
Unbiasedness: mean independence
- E(e|X) = E(e) = 0
Regressions in both directions implications
Run regression both directions
- Both coefficients have descriptive interpretations
- Only one coefficient can have causal
- in general B does not equal 1/y
- Inverting a LRM (or CEF) does not yield a LRM (or CEF)
- to persuade causal need to persuade OR is plausible
Define descriptive interpretation
“on average a unit increase in X1 is associated with a b* increase in Y, holding X2..Xk constant”
Standard error of regression
s = square root (SSR/(n-k-1)
Least squares assumptions
1) error term is conditional mean 0
2) X Y are iid draws from joint dis
3) non-finite fourth moments - large outliers are unlikely
4) no perfect multicollinearity
Talk about consistency
Means sample β is very close to u with high probability
- consequence of LLN
- distribution of sample β collapse to β
Talk about asymptotic normality
- Consequence of CLT
- β - β => N(0, σ^2)
Talk about asymptotic variance of beta / se(B hat)
w^2 = σu^2 / Var (X) (write as sum)
- OLS is more precise the larger Var (X) and the smaller σu^2 (better fit - can get by adding more regressors)
- valid for IV: good control high variance in Var(X) explained by X
Talk about imperfect multicollinearity
- high correlation of X with other regressors so Var(X ̅ ) is very small
- beta measured imprecisely
Hypothesis testing steps
1) state null and alternative
2) get t stat
3) Under the null t -> N(0,1)
4) Decision rule
5) Outcome
Talk about one-sided tests
- only makes sense if there’s an a priori reason (e.g. from eco theory) to excluding other direction from consideration
- has more power to detect departures from the null in positive direction but not power to detect departures in the negative direction
Pval definiton and usefulness
- prob, under the null, of obtaining a value of t at least as averse to the null as the one computed.
- summarising the weight of evidence against the null
Confidence interval interpretation
The collection of null hypothesised values for β that would be accepted (by a 2-sided t test) at significance level ∝
- set of null hypothesis that i couldn’t reject if I do a 1% confidence test
Polynomial in regression vs linear
- polynominal: can look at marginal effect of X on Y - differentiate
- Linear: averages out ease different marginal effects
- coefficient on x1x2 is effect of a one-unit inc in x1 or x2 above and beyond a unit inc in each of them alone
Causes of endogeneity
1) omitted variable bias
2) measurement error
3) simultaneity
OBV formula and usefulness
β’ = β + yCov(X1, X2) / Var (x1)
- assess likely direction of the bias
Impact of measurement in error in Y
- inferences on β still valid, just estimate of β is less precise
Example for IV for demand elasticity of cigarettes (why a good one)
General sales tax:
- cov(t,p) not equal zero
- cov(t,u) = 0 (assume not state specific)
Solutions to bad controls
1) Find an instrument for education and estimate model via 2SLS
2) omit from regression
- Interpretation: ‘total effect’ of labour market discrimination inclusive of its effects of educational attainment
Why 2SLS less efficient than OLS
- Var(X) = Var(X* + u) = Var(X) + Var(u) > Var (X)
- only looking at part of X explained by D so less precise
Tension in choosing instruments
- Want to be highly correlated with X: inc Var(X*)
- requiring variables to be exogenous Cov(Z,u) = 0
Test for relevance
F > c = 10
Test for exogeneity
- Descriptive exogeneity
Can’t test for one Z
- Test for more than one Z:
H0: cov(z1, u) =..= cov(z2,u) = 0
F test F-> Fm-1, infinity
Z correlated with other unobserved determinants of Y
Stationary definition and meaning
- strict and weak
1) E(Yt) = u for all t,
2) Var(yt) = σ^2 < infinity for all t
3) Cov(Yt, Yt-j) depends on j but not t
- Def: if its probability distribution does not change over time, joint distribution doesn’t depend on s.
- Says: past is like the present and the future, at least in a probabilistic sense
Weak: 1st and 2nd moments exist and are time invariant
Meaning: models can be used outside the range of data with which they were estimated
- |β| < 1 var(ut) = σ^2
- Y0 is random variable with E(Y0) = β0/1-β1 and Var(Y0) = σ^2 / 1-β1^2
What I(0) means
- process is stationary OR trend-stationary
Issue with ADF test
Has notoriously little power in distinguishing between unit roots and very persistent but stationary alternatives i.e. when β1 = 0.9
Chow test
Testing for a break at period T:
- create a dummy variable 0 t>T, 1 t
Test break without known T
QLR ratio:
- Treat t as unknown parameter and estimate alongside the regression coefficients
- F(t): chow F stat for a break at t
- F(t0): 15th percentile of T F(t1): 85th percentile
- QLR stat is the largest Chow F statistic one can get across all candidate break dates
- can’t look too close to start and end
- critical value much larger than F
- QLR doesn’t tell us exactly how equation changes
Requirements for efficiency
Variance smaller AND unbiased (show)
Spurious regression definiton
Both I(1) (stochastic and independent) and not cointegrated
- Stochastic is… Y = Yt-1 + ut Xt = Xt-1 + e
Pr ( t) > 1.96 is high - of significant result -> appear related
- misleading inference even in large samples
Examples of spurious regression
FX rates
Stock market and consumption
Cointegration steps
Engle-Granger ADF test:
1) Check via ADF that both I(1)
2) Estimate θ via regression y on x
- OLS regression of Y on X yields a consistent estimator for θ
3) Store residuals and test them via ADR where H0: random walk H1: stationary
- if reject H0 then by definition cointegrated
- use different critical values to account for sampling uncertainty in estimating θ
Define cointegration
Xt, Yt ~ I(1)
Yt - a - b Xt ~ I(0)
Problems when β = 1
1) When sample large OLS estimator is biased towards zero: = 1-5.3/T
2) Distribution of t-stat and β^ not normal even in large samples (use ADF values)
3) Spurious regression
Difference between residual and forecast error
Residual: in sample
Forecast error: out of sample
Difference between forecast and confidence interval
Yt+1 is random, not a non-random coefficient
Definiton of causal effect
Coefficients measure the causal effect of ceteris paribus exogenous change in the explanatory variable on the dependent variable Y
Definiton of conditional independence and what it means
D || {Y(1), Y(0)} | X –> E(Y(0)|D,X) = E(Y(0)|X)
Means: E(Y(0)|D = 1, X = x) = E(Y(0)|D = 0, X = x)
Says that variables are related but only through X
- conditioning restores independence
to control for all the non-random variation in assignment such that what variation is left over is plausibly independent of potential outcomes - clean out selection bias
Meaning of LLN and implications
X- is near u with high probability when n is large
lim{Var(X-)} = 0
Implications: establishes the conditions where estimator is consistent
Type 1 error definition and example when worried about it
P(reject H0| H0 true)
When costly to administer
Type 2 error definition, definition of power and example when worried about it
P(accept H0| H1 true) = β
- Power = 1 - β = P(reject H0| H1 true)
- Power: ability to detect a violation of the null
Could save someone’s life
CLT requirements, implications
Need iid, E(X^2) < infinity
- √n (X ̅- μ)→N(0,σ^2) as n -> ∞
- distributions of β^ are approx normal when n is large
- “from CLT x is approx N(u, sd^2 / n )
Binomial E, Var, se
E= np
Var = np(1-p)
se √p(1-p)/n
Jensen’s inequality if g(x) concave
E(g(x)) < g(E(x))
Cov / Corr formula
Corr(x,y) = Cov(x,y) / √var(x)var(y) = p
F test
Takes into account the joint significance of the estimators
- (whether they significantly reduce the unexplained variation in the data compared to not using them)
- Use information criteria (AIC)
What it means if D independent of potential outcomes
1) probability of assignment to treatment doesn’t vary with potential outcomes
2) distribution of potential outcomes doesn’t vary with treatment status
- leads to mean independence of the potential outcomes wrt treatment
Internal validity defintion
- key
- examples
Inferences about causal effects are valid for the population being studied
- Key: plausible that error is OR
- Contamination (control access treatment)
- Non-compliance (treated -> untreated)
- Placebo (final outcomes changed as perceived changes)
- Hawthorne (change behaviour - imperceptible changes)
- individualistic treatment response: no interaction effects between subjects: outcome doesn’t depend on whether other’s get treatment
External validity defintion
- key
- examples
whether a study’s findings can be generalised to other populations and settings
- Key: population differs in a way that alters the causal effect of interest which is not accounted for by the model (not captured in the X’s) - do lots of RCT and see which factors impact outcome
- individualistic treatment response (no spillovers)
- long vs short outcomes (surrogate outcomes e.g. class size on education vs long-term employment)
- supply-side administering
- ‘income’ elasticity: perceive income transfer and other sources of income the same
ITT defintion
The average causal eject of a program or policy that is introduced to a group of individuals without knowing whether these individuals participate
LATE defintion
The average causal effect of treatment delivery on the outcome of interest, along compliers
LATE assumptions and their use
1) Independence (good as randomly assigned)
- can measure causal of Z (assignment) on Y (outcome) and X (delivery)
2) Relevance
- compute ratio as demon not 0
3) Exclusion (Z only impact Y via D)
- Yi(d, 0) = Yi(d,1) for d = 0, 1
- AT NT don’t change when instrument gets switched on/ off so removed
4) Monotonicity (impact one direction)
- excludes defiers
Relationship between ATT and LATE
ATT = yATAT+ (1-y) LATE
LATE and ITT relationship
LATE > ITT as with ITT the treatment effect gets diluted among non-treated individuals
Allowing for heterogenous treatment effects
Y = a + βX + dD + yDX + u
Ho: y =0
- OLS is a consistent estimator of the average causal effect if randomly assigned
- IV: weighted average of the individual treatment effect, most influential = greatest weight
Problem with return to schooling
Assignment to schooling is not random. Need to use CIA to identify causal effect - add regressors which account for the non-random assignment of schooling
Issue with twins, internal and external
Internal:
- measurement error is exacerbated when taking differences
- fewer observations: reduces variation in X
- impact on coefficient standard error - differences in ability develop: epigenetics
- parent behaviour / investment to twins differ across families
External:
- unsure if can extrapolate to whole population: majority due to IVF - selection bias - not random
- worse health as have to compete for resources
AR(p) model
Autoregressive model
- Use f test to test hypothesis that Yt-2,…,Yt-p do not further help forecast Yt beyond Yt-1
- p: the highest number of lags that’s relevant
What is the Autoregressive distributed lag model and test for it
- Includes lags of X as well as Y
- Granger-Causality test (F-stat) - test the joint hypothesis that none of the X’s is a useful predictor above and beyond lagged values of Y. Causality here refers to the marginal predictive content
- Test: E(Yt|Yt-i, Xt-i) = E(Yt|Yt-i)
Defintion of a trend
A persistent long-term movement of a variable over time
Two trends
Stochastic: - Yt = Yt-1 + ut - Δ Yt = ut - called I(1) Deterministic: a x t
Random walk with a drift equation
-
Both deterministic and stochastic trends
- Yt = a1 + Yt-1 + ut
- Assuming Y0 = 0 then Yt = a1 x t + sum of us = DT + ST
- E(y) = at Var(y) = σ^2t Cov(yt, yt-s) = σ^2(t-s)
Removing trends
Deterministic:
- regress Yt on function of time and take residuals e.g. linear detrending: Y~ = Y - a0,ols - a1,olst
Stochastic:
- differencing
Related I(2) I(1) I(0) using inflation
Relate AR(1) AR(2) using inflation as example.
Why use Δinflation
logCPI is I(2), inf I(1), Δinf is I(0)
AR(2): inf AR(1): Δinflation
- first difference much less serially correlated so use to keep in a stationary framework we understand. If strongly serially correlated then AR coefficient is biased towards zero
Test for a unit root equation and H0 H1
Dicky-Fuller test
ΔYt = β0 + δYt-1 + ut
Ho: δ = 0 stochastic (unit root)
H1: δ < 0 stationary
Don’t use normal critical values
Distributed lag model aim and assumptions
To measure dynamic causal effect
- find cumulative dynamic multipliers
1) X is exogenous (E(u|Xt..Xt-s) = 0
2) Y and X have stationary distributions and become independent as j gets large
3) X and Y have no-zero finite 8th moments
4) no perfect multicollinearity
Things to remember with distributed lag model for causal effects
- OLS yields consistent estimators for β
- sampling distribution of β is normal
- formula for variance is not the usual as ut may be correlated
- need to use HAC standard errors
The problem to which HAC is the solution
ut being serially correlated
What mean by “significant at the 1% level”, “t-value”
Significant - implicitly assume two-sided test - reject with 99% confidence - lower a = larger c = less powerful (smaller type 1 error) T-value - testing the hypothesis that beta = 0
Why use IV
Instrument is used to isolate the movements in X that are uncorrelated with the error term (first stage), thereby allowing consistent estimation (2nd stage)
Issues with IV when not everyone is affected by the instruments
Discuss compliers, AT, NT, defiers
- LATE: instrument is binary (compliers, noncompliers)
What is selection bias
- dealing with it
The bias in an estimator of a regression coefficient that arises when a selection process influences the availability of data and that process is related to the dependent variable.
- results in cov(u, x) not 0
- violates independence assumption
- not like-for-like
- can use conditional independence
What is RMSFE? Assumption for it?
A measure of the spread of the forecast error distribution - magnitude of a typical forecast mistake.
- impose normal distribution rather than take for granted - no CLT for forecasting interval as it could be non-stationary
- Yt is a random variable not a parameter
Talk about AIC
- used for model selection as provides ranking
- trades of goodness of fit and simplicity
- compare different models for the same data set
- the one with lowest value for AIC is best quality
- relative measure of model fit
- penalises overfitting
Approximating CEF with LRM
- limiting/ inaccurate if curved
- doesn’t have to be limited to single regressor can include polynomials to have better fit
- no certainty CEF is continuous: LRM could never give correct value with discrete function
- ‘best’ as minimised the squared error
- in general variance of prediction error or the CEF is lower than that of LRM
“As good as randomly assigned”
- Cov(X, u) = 0
Explain 2SLS
First stage: used to create the ‘generated ‘instrument’ or ‘adjustment treatment variable’
Adding another regressor
For
- exploit conditional independence assumption to remove omitted variable bias. E.g. if corr with dummy so differ between men and women
- inc statistical precision since reduce error variance but not biased coefficient
Against:
- if X is endogenous then same problem - compositional bias
Simultaneity bias
- when two variables are jointly determined and then used in a regression
- e.g. p and q of any good
Example of compositional bias
Adding occupation for wage differences in men and women, if women don’t have the same opportunities then still discrimination
- If Z is dependent on X then endogenous
List potential outcomes
When no AT NT?
Compliers Di(1) = 1 Di(0) = 0
AT: Di(1) = 1 Di(0) = 1
NT: Di(1) = 0 Di(0) = 0
Defiers: Di(1) = 0 Di(0) = 1
- eligibility for treatment can be controlled
- only those assigned can receive it
Definition of a break and its problems
A change in probability distribution of the data
- coefficient in model are not stable over the full sample / time
Problems:
- destroy external validity
- more important cause of forecast failure
- in-sample estimates of coefficients to be biased
- OLS estimates “average value” which does not correspond to the true causal effect at any period
- can be difficult to distinguish between multiple breaks and stochastic trends -> break mistaken for a RW (graph)
If perfect compliance
- No always takers
Z = D so LATE denominator:
- E(D|Z=1) - E(D|Z = 0) = E(D|D=1) - E(D|D = 0) =
1 - 0
- LATE = ATE
- LATE = ATT requires LATE assumptions plus P(D= 1|Z=0) = 0 i.e. no AT
Why adjusted r squared
Occam’s Razor: best model one which fits best with the fewest regressors
- with normal, adding additional inc R even if neglibale explanatory power
- (n-1)/(n-k-1) > 1 -> penalises for adding another
Talk about augmented DF test
- use enough lags so that the residuals are serially uncorrelated
- more lags = smaller sample = lose degrees of freedom = inc standard error
- don’t use too many lags
Adding a trend to DF test
- if trend and don’t include: biased in favour of finding a unit root. High type 2 error.
- inc critical values as y makes distribution of t-stat more dispersed / skewed
Determining granger causality when have non-stationary
Use an ECM if cointegrated
- subdivided into SR and LR causality
Difference: if cointegrated doesn’t provide LR
Marginal distribution
Sum the joint distributions
- P(rain) = P(r and l) + P(r and s)
Estimate vs estimator
Estimator: function of a sample of data to be drawn randomly from a population - it is a random variable
Estimate is the numerical value of the estimator when a specific sample is drawn; it is a nonrandom number
What is correlation
Measure of strength of the linear association between X and Y. Lies between -1 and 1
t stat
How far beta hat is from null, relative to se(b)
QoB
Angrist and Krueger (1991)
- Exclusion: does schooling age matter by itself
- Find tend to do better in schooling as more schooling and higher earnings
- if anything biased downwards
- find doesn’t directly impact earnings (exclusion)
- compliers: the group for which the instrument changes their decision
FX
PPP UIP LOOP suggests FX cannot have a unit root
- cannot deviate permeanetly from from the ratio of prices in two countries
With spurious, is differencing valid?
Yes if assume ut in difference regression is white noise -> not correlated overtime
FWL theorem
The variation in X that cannot be explained by a linear combination of the other regressors
- isolates the effect of Xk
- coefficient reflects the individual predictive contribution of each individual regressor alone