1. Why does higher variance in X lead to lower variance in the slope coefficients of regression model? 2. Intuition?

1. Greater variation in X means we can obtain more precise estimate of slope coefficient 2. Intuition - if all data bunched around the mean, hard to draw linear line; easier if there's more variation

QE 3/4 - regression Flashcards by Tom goldsworthy

Causes of residuals (errors between observations and ability to predict):

Measurement error

2. Specification error

How well did you know this?

Not at all

Perfectly

Properties of the CEF residual

E[e] = 0
E[e*g(x)] = 0
E[e given X] = 0

How well did you know this?

Not at all

Perfectly

Properties of the LRM residual

E[u] = 0
E[uX] = 0
E[u given X] = ?

How well did you know this?

Not at all

Perfectly

Why might the LRM residual (u) not be mean-independent of X (unlike CEF residual)?

LRM is a linear model, but CEF may be non-linear/wiggly

2. Therefore, expected value of error at different values of X not necessarily the same for LRM

How well did you know this?

Not at all

Perfectly

What is the interpretation of the p-value?

Probability of getting the estimate from the data, given that the null hypothesis is true

How well did you know this?

Not at all

Perfectly

What does high R-squared tell you? What does it not tell you?

1a. R-squared close to 1 means the explanatory variables are good at fitting the data (Y)
1b. Provides estimate of strength of relationship between model and data

2a. Does not mean the model will be good at extrapolating out of sample
2b. Does not say anything about causality

How well did you know this?

Not at all

Perfectly

Threats to internal validity

Contamination – people in control group access treatment anyway
Non-compliance – individuals offered treatment refuse to take it
Hawthorne effect – participants alter behaviour due to participating in experiment/study
Placebo effect – impacts final outcomes because of perceived changes

How well did you know this?

Not at all

Perfectly

What is the stable unit treatment value assumption (SUTVA)?

Experimental ideal works only if there are no interaction effects between subjects
i.e. each’s outcome depends only on their own treatment, not those of others

How well did you know this?

Not at all

Perfectly

What is the conditional independence assumption?

Treatment assignment is independent of potential outcomes, conditional on covariates

How well did you know this?

Not at all

Perfectly

Explain how the conditional independence assumption plausibly allows identification of causal effects

Run regressions that include causal variable of interest + co-variates
Co-variates assumed to ‘control for’ non-random variation in treatment assignment
Variation left over (Frish-Waugh-Lovell theorem) plausibly independent of potential outcomes
If credible, treatment assignment conditionally independent of potential outcomes and can therefore measure causal effects

How well did you know this?

Not at all

Perfectly

Explain how Frisch-Waugh Lovell theorem works (verbally)

Find independent variation in X, not explained by other regressors
Find independent variation in Y, not explained by other regressors
Find independent variation in Y (not explained by other regressors) that is explained by independent variation in X

How well did you know this?

Not at all

Perfectly

Explain the least squares assumption that E[u given X] = 0
What is this assumption equivalent to?
What does this assumption imply?

1a. ‘Other factors’ within residual (u) not systematically related to X (i.e. given value of X, mean of distribution of u = 0)
1b. Sometimes these other factors within residual lead Y to be higher/lower than predicted, but on average 0
2. Equivalent to assuming that the population regression line = conditional mean of Y given X
3. Implies that X and u uncorrelated

How well did you know this?

Not at all

Perfectly

Why does higher variance in X lead to lower variance in the slope coefficients of regression model?
Intuition?

Greater variation in X means we can obtain more precise estimate of slope coefficient
Intuition - if all data bunched around the mean, hard to draw linear line; easier if there’s more variation

How well did you know this?

Not at all

Perfectly

What is perfect multi-collinearity?

When 1 regressor = perfect linear combination of the other regressors

How well did you know this?

Not at all

Perfectly

Mathematically, why does perfect multi-collinearity make it impossible to calculate the OLS estimator?

Division by 0 in OLS formulas

How well did you know this?

Not at all

Perfectly

Intuition behind why perfect multi-collinearity is a problem

Study These Flashcards

Asks illogical question
In multiple regression, coefficients = effect of change in that regressor, holding other regressors constant
But if 1 regressor is a perfect linear combination of others, then asking the effect of change in that regressor, holding itself constant…

Example of perfect multi-collinearity

Study These Flashcards

Fraction of English learners and % of English learners
If one regressor is same for all observations, then it is a perfect linear combination of the intercept term (if there is an intercept)

How to avoid the dummy variable trap?

2. What is the interpretation of the binary variables included?

Study These Flashcards

Exclude one of the binary variables from the regression

2. Represent incremental effect, relative to base case of omitted category

What is imperfect multi-collinearity?

Study These Flashcards

Means 2 or more regressors are highly correlated (in sense that there is a linear combination of the regressors that is highly correlated with another regressor)

What is the implication of imperfect multi-collinearity

Study These Flashcards

Coefficients on at least 1 regressor imprecisely estimated
Difficult to estimate precisely one or more of the partial effects

What is the F-statistic used for?

Study These Flashcards

To test joint hypotheses about regression coefficients

What question is addressed by the use of the F-statistic?

Study These Flashcards

Does relaxing q restrictions that constitute the null hypothesis improve fit of regression sufficiently that improvement unlikely to be result merely of random sampling variation (if H0 = true)?

What should large F-statistic be associated with?

Study These Flashcards

Significant increase in R-squared

When is the null rejected in an F-test?

Study These Flashcards

If SSR sufficiently small in unrestricted regression compared to restricted regression, then test rejects null hypothesis

What are control variable used for?

1. Control for causal effect of a variable (if any) 2. Control for omitted factors that affect Y and are correlated with X 3. Increase precision of estimates (if control variable not correlated with regressor of interest but correlated with outcome, then standard errors of estimators reduced)

Steps to decide whether/not to include a variable in regression

(1) identify coefficient of interest (2) assess a priori (before reviewing data) most important likely sources of omitted variable bias (3) test whether additional ‘control’ variables (identified in step 2) statistically significant/if estimated coefficient of interest changes measurably when controls added (if so, keep them; if not, remove them) (4) fully disclose all regressions to allow others to judge for themselves

Problem with including a variable where it doesn't belong (i.e. population regression coefficient = 0)?

Reduce precision of estimators of other regression coefficients

Consequences of simultaneous causality?

1. OLS estimator biased and inconsistent (because it picks up both effects) 2. Correlation between regressor and error term

1. When does compositional bias arise? | 2. Explain what it means using an example

1. If add control variable that is an outcome of the regressor of interest, then get compositional bias 2a. Example - control for occupation when regressing earnings on schooling 2b. Effect of schooling on earnings, controlling for type of job you do, will seem small 2c. Problem is that schooling affects your occupation and influences earnings partly via your occupation 2d. Can't assess impact of schooling on earnings by looking within occupations

QE 3/4 - regression Flashcards

(29 cards)