QE 3/4 - regression Flashcards
Causes of residuals (errors between observations and ability to predict):
- Measurement error
2. Specification error
Properties of the CEF residual
- E[e] = 0
- E[e*g(x)] = 0
- E[e given X] = 0
Properties of the LRM residual
- E[u] = 0
- E[uX] = 0
- E[u given X] = ?
Why might the LRM residual (u) not be mean-independent of X (unlike CEF residual)?
- LRM is a linear model, but CEF may be non-linear/wiggly
2. Therefore, expected value of error at different values of X not necessarily the same for LRM
What is the interpretation of the p-value?
Probability of getting the estimate from the data, given that the null hypothesis is true
What does high R-squared tell you? What does it not tell you?
1a. R-squared close to 1 means the explanatory variables are good at fitting the data (Y)
1b. Provides estimate of strength of relationship between model and data
2a. Does not mean the model will be good at extrapolating out of sample
2b. Does not say anything about causality
Threats to internal validity
- Contamination – people in control group access treatment anyway
- Non-compliance – individuals offered treatment refuse to take it
- Hawthorne effect – participants alter behaviour due to participating in experiment/study
- Placebo effect – impacts final outcomes because of perceived changes
What is the stable unit treatment value assumption (SUTVA)?
- Experimental ideal works only if there are no interaction effects between subjects
- i.e. each’s outcome depends only on their own treatment, not those of others
What is the conditional independence assumption?
Treatment assignment is independent of potential outcomes, conditional on covariates
Explain how the conditional independence assumption plausibly allows identification of causal effects
- Run regressions that include causal variable of interest + co-variates
- Co-variates assumed to ‘control for’ non-random variation in treatment assignment
- Variation left over (Frish-Waugh-Lovell theorem) plausibly independent of potential outcomes
- If credible, treatment assignment conditionally independent of potential outcomes and can therefore measure causal effects
Explain how Frisch-Waugh Lovell theorem works (verbally)
- Find independent variation in X, not explained by other regressors
- Find independent variation in Y, not explained by other regressors
- Find independent variation in Y (not explained by other regressors) that is explained by independent variation in X
- Explain the least squares assumption that E[u given X] = 0
- What is this assumption equivalent to?
- What does this assumption imply?
1a. ‘Other factors’ within residual (u) not systematically related to X (i.e. given value of X, mean of distribution of u = 0)
1b. Sometimes these other factors within residual lead Y to be higher/lower than predicted, but on average 0
2. Equivalent to assuming that the population regression line = conditional mean of Y given X
3. Implies that X and u uncorrelated
- Why does higher variance in X lead to lower variance in the slope coefficients of regression model?
- Intuition?
- Greater variation in X means we can obtain more precise estimate of slope coefficient
- Intuition - if all data bunched around the mean, hard to draw linear line; easier if there’s more variation
What is perfect multi-collinearity?
When 1 regressor = perfect linear combination of the other regressors
Mathematically, why does perfect multi-collinearity make it impossible to calculate the OLS estimator?
Division by 0 in OLS formulas