Assumptions Flashcards
What is the SUTVA assumption?
The potential outcomes of an individual i do not depend on the treatments received by other individuals.
ATT Defintion
Mean difference in observed outcomes and counter factual for treatment group
ATC defintion
Mean difference in observed outcomes and counter factual for control group
ATE defintion
Mean effect in the entire population, whether or not they actually participate
Mean Independence Assumption (MIA) and what does it mean if it holds?
E[Y0|D = 0] = E[Y0 |D = 1] and E[Y1|D = 0] = E[Y1|D = 1].
If it holds, data is generated from experiment and potential outcomes are independent of treatment.
Perfect balance between treated and untreated
What happens to the treatment effects under randomisation? And why?
DIM = ATT (No BB) = ATE (No BB or DTE) = ATC
What are some issues with experiments, and threats to internal validity?
They can be costly, impractical and sometimes impossible.
Threats to internal validity:
Hawthorne effect - People react to being observed
John Henry effect - People react to be in control group
What is the CIA?
The CIA is when the potential outcomes are independent of treatment given X is controlled for
What is the CMIA, and what does it mean if it holds?
If it holds, the selection bias disappears after conditioning on the observed characteristics X, as the treatment is as good as random. They will, on average, have the same potential outcomes.
E[Y0|D = 1, X] = E[Y0|D = 0,X]
What are the 2 assumptions to calculate the treatment effect from observational data?
E[Y1|D = 1, X] = E[Y1|D = 0, X]
E[Y0|D = 1, X] = E[Y0|D = 0, X]
Different types of individuals based on the potential treatments
Always Takers: D(1) = 1 and D(0) = 1
Never Takers: D(1) = 0 and D(0) = 0
Compilers: D(1) = 1 and D(0) = 0
Defiers: D(1) = 0 and D(0) = 1
What are 2 assumptions for making calculations on ‘Always takers’ etc?
No defiers
Instruments are independent of treatment
What makes a valid instrument?
- Has a causal effect on treatment (First Stage)
- It’s as good as randomly assigned
- It effects outcomes only through treatment (Exclusion Restriction)
Benefits of the 2SLS
1) Allows use of multiple instruments
2) Controls for exogenous variables
2) Controls for observable characteristics X
What do the elements in this regression mean?
M_Hat[A] = alpha + rhoD[a] + gamma[alpha] + e[alpha]
M - Mortality Average
Rho - Estimate of the jump exactly at the threshold
Gamma - Slope coefficient
D[a] - treatment dummy
a - running variable
Why might a simple linear model produce misleading estimates? Regression discontinuity
If relationship between 2 variables is not linear
If relationship between variables is not the same on each side of the discontinuity.
What identification assumptions need to be made for an estimate to be causal? RD
- Independence of potential outcomes either side of the discontinuity
- No OVB in the estimating equation, implies rho is causal
- No other ‘jumps’ at D (threshold)
How do you test the causal estimate assumptions? RD
2 possible tests
- See if other observable characteristics are balanced either side of the discontinuity
- Ensure that there is no manipulation of the running variable. (If no manipulation, density of a would be smooth around DC)
What is manipulation?
Manipulation is when the variable X is chosen -> ideally we would like it to be something like age
Pros and cons of Narrower bands for Band width
Pros - Less likely to be misspecified, closer to true estimate of rho
Cons - Means less data and less precise estimate
What is the difference between standard error and standard deviation?
Standard error is the difference in how much the mean would vary if it were measured from lots of different samples.
Standard deviation is a measure of how much observations vary from one another.
What is the relationship between Regression and CEF, and what would be saturated model mean?
Regression is an approximation for the CEF. If the regression model is saturated, the regression should have the same number of parameters as the CEF has values. (Another way of estimating a naive comparison of means)
CEF: E[Y|D]
CEF is just an average -> does not mean its causal
Baseline Bias
Difference in average outcome, in absence of treatment, between the treated and untreated.
E[Y0|D=1] - E[Y0|D=0]
DTE Bias
The benefit of the treatment (causal effect), for those who are treated and untreated is not the same.
If positive the treated gain more.
(1-pi){ E[Y1-Y0|D=1] - E[Y1-Y0|D=0]}
Where pi is the proportion of sample who getting treated
Why does matching and regression produce different results?
Matching, groups ‘matches’ individuals who have same observable characteristics and computes ATE/ATT for each group.
OLS is simply a weighted average of the ATE of these groups.
OLS uses all observations, even those off common support. OLS is much quicker and easier.
What happens if there is OVB?
If there is OVB, the CIA doesn’t hold and therefore regression estimates will be biased.
What is the difference between Long and Short Regression?
Long regression controls for selection, so includes a dummy (control) variable, whereas the short regression does not.This means the short regression gives a biased estimate, so the difference between the 2 is the OVB - Baseline and DTE Bias occurs.
OVB formula and explain each element?
Effect of D in short (Biased) =
Effect of D in long (Unbiased) +
Relationship between omitted and included (Pi1 in aux regression: X = Pi0 + Pi1 D)
x effect of omitted in long (gamma in long regression)
So OVB: Beta(S) - Beta(Long) = pi1 x gamma
Advantages of Regression and what does OLS give?
We can add observable characteristics (X) and use them as control variables and if CIA holds, OLS gives estimator of the Average Treatment Effect (ATE).
What are potential omitted variables? In the aux regression?
Potential omitted variables are anything thats correlated with the treatment.
Then in X = Pi0 + Pi1D, if pi1 is significant, X is correlated with the treatment.
What is a bad control?
A bad control is a variable which is itself an outcome variable: something which might be affected by the treatment.
Be careful with these, but usually more controls is always better.
Residuals properties
Variance - How well the regression fits the data (R squared)
Regressions will produce 0 as uncorrelated with the regressors.
e(i) = Y(i) + Y_hat(i)
Regression standard errors
For a sample, we estimate Beta with Beta_hat:
SE(B_hat) = sigma(e)/sqrt(n) x 1/ sigma(x)
1/sigma(x) is the residual variance square rooted
How to calculate an IV estimate?
You have the calculate the Wald ratio (lambda = rho/phi), which is equal to the reduced form divided by the first stage.
Z->Y / Z->D
How do you calculate the first stage in IV?
Z -> D
P[Di|Zi = 1] - P[Di|Zi = 0] = phi
How do you calculate the reduced form in IV?
Z -> Y
P[Yi|Zi = 1] - P[Yi|Zi = 0] = rho
What is one thing to remember about lamba (Wald Ratio)?
It is a Local Average Treatment Effect (LATE), meaning it’s only an average for a certain group, this group being the compilers - those who obey their lottery outcome.
3 assumptions to identify a causal effect with DiD
1) Treatment and control group
2) These treatment and controls groups are comparable
3) There is info on treatment and control group, before and after the treatment occurs.
Common (Parallel) Trends Assumption
In the absence of treatment, the difference between the treatment and control group remains constant over time.
How would you violate the common (parallel) trends assumption?
Add an unobserved variable (X) into the regression that is correlated with the treatment and changes at the same time as the treatment -> causes OVB.
4 Pros for using DiD Regression
1) Can easily calculate standard errors for DiD
2) Treatment can be continuous, not just binary
3) Easily add control variables
4) Easily add additional time periods
MLDA vs Real effect Vs Spurious Effect
MLDA - parallel trends assumption holds -> simple model
Real - Trends aren’t parallel, there is a DiD effect, and is a differential time effect
Spurious - Trends aren’t parallel, No DiD effect but is a differential time effect
2 Problems with normal standard errors in DiD, how to fix
For panel data(repeated observations on same units over time), it can be a poor estimate of the uncertainty of our estimate.
1) Heteroskedasticity -> Use robust SE
2) Serial correlation -> Use clustered standard errors - they relax assumptions that observations are independent (need a reasonable amount)
Why do we use 2SLS and steps involved in the process?
Occurs when we are combining two separate instruments and want to calculate 2 IV estimates.
1) First stage is a regression
2) Calculate fitted values (no residual)
3) Estimate second stage
4) Finally, can add control variables simply by adding them to first and second stage.
How can you test whether randomisation was successful?
Can use a t-test for continuous variables, to test whether there is a statistically significant difference in average outcome, between treatment and control group. -Value < 0.05 and t statistic > 2, to reject the null.
Confidence Intervals
[ Y_bar - (2xSE(y_bar), Y_bar + (2xSE(y_bar)]
T - test
Want to test whether E[y] = x
T(mu) = Y_bar - mu / SE(Y_bar)
Or t(0) = Y_bar / SE(Y_bar)
What is non-compliance, and what does it do to randomisation ?
Non compliance occurs when participants do not adhere to their treatment or control group, leading to a deviation in the intended random assignment.
Causes selection bias which affects results.
What can we do to combat non-compliance?
We could use the Intention to Treat Analysis, which is the impact of offering the treatment, as opposed to the impact of the treatment itself.
ITT maintains the benefits of random assignment. So provides an unbiased estimate for the causal effect of the treatment.