term 1 - multiple regressors - revise this for week 2 Flashcards
what is ommitted variable bias?
when an ommited varibale Z is a determinant of Y and is correlated with the regressor X, then the OLS estimator will be biased
are all ommitted variables equal?
no we cannot include all ommitted variables and we dont need to. we only need to include those which are a determinant of Y and correlated with X as ommitted variable bias will occur
what are the two uses of regression?
for prediction and to estimate causal effects
what does randomisation imply?
that any differences between the treatment and control groups are random and not systematically related to the treatment
what are the three ways to overcome OVB
run a randomised controlled experiment in which treatment is randomly assigned.
2) adopt the cross tabulation approach with finer gradations of Y and X
3) use a regression in which the ommitted variable is no longer ommitted
wbat does the OLS estimator solve?
the OLS estimator minimises the average squared differences between the actual values of Y_i and the prediction based on the estimate line. this yields OLS estimators of B_0 and B_1
what is R^2?
the fraction of the varience which is expained by the regression line
what is the adjusted R^2?
the adjusted r squared corrects the problem that r squared always increases when you add another regressor by penalising you for including another regressor.
what is the equation of the adjusted R^2?
1- [(n-1)/(n-k-1)]* SSR/TSS Where k is the number of regressors
what occurs to the difference between the adjusted R^2 and R^2 when n is large?
when n is large the difference between the adjusted R^2 and R^2 decreases as k becomes relatively smaller compared to n. the adjusted R^2 is still smaller however
what is least squares assumptions for casual inference in multiple regression?
1) the conditional distribution of u given the X’s has mean zero that is, E(u| X_1i =x1,…,x_ki=xk)=0
2) (X_1i,…..,X_ki, Y_i), i =1,…,n are i.i.d
3) large outliers are unlikely : X1,…,Xk and Y have finite fourth moments
4) there is perfect multicollinearity
what least square assumption failure leads to OVB?
the condition distribution of u given X’s has mean 0, if this fails OVB occurs
what type of sampling leads to iid of the regressors and variables?
simple random sampling creates independent and identical distributions of the variabels
why do outliers need to be rare?
the OLS can be sensitive to large outliers, so you need to check your data to make sure there are no crazy values
what is perfect multicollinearity?
when one of the regressors is an exact linear function of the other regressors
what is the dummy variable trap?
when there is a set of multiple binary variables which are mutally exclusive and exhaustive. if you include all these variables and a constant then there is perfect multicollinearity
what are the solutions to the dummy variable trap?
omit one of the groups or omit the intercept
what is the solution to perfect multicollinearity
The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity.
what is imperfect multicollinearity?
Imperfect multicollinearity occurs when two or more regressors are very highly correlated.
Imperfect multicollinearity implies that one or more of the regression coefficients willbe imprecisely estimated (estimators will have higher standard error
what is a control variable?
A control variable W is a regressor included to hold constant factors that, if neglected, could lead the estimated causal effect of interest to suffer from omitted variable bias.
what are three interchangable statements about what makes an effective contorl variable?
i. An effective control variable is one which, when included in the regression, makes the error term uncorrelated with the variable of interest
.ii. Holding constant the control variable(s), the variable of interest is “as if ” randomly assigned.
iii. Among individuals (entities) with the same value of the control variable(s), the variable of interest is uncorrelated with the omitted determinants of Y
do control variables need to be causal?
no the control variables need not be causal and their coefficients generally do not have a causal interpretation
how do they test a single coefficenient in multiple regression?
hypothesis tests and confidence intervals for a single coefficient in multiple regression follow the same logic and recipe as the slope in a single regressor model
what is a joint hypothesis?
a joint hypothesis specifies a value for two or more coefficients, that is, it imposes a restriction on two or more coefficients