QE 2-4: Regression analysis Flashcards
What is the difference between ‘causal analysis’ and ‘descriptive analysis’?
Descriptive analysis asks how variables are correlated, and how well we can predict Y from observing X.
Causal analysis makes further counterfactual claims, asking how Y would change if we were to change X.
What is the sample OLS problem?
The sample OLS problem finds the ‘line of best fit’ through a set of points that minimises the squared distance from the line to the points.
What is the population regression problem? How does it relate to the sample OLS estimates?
The population regression problem minimises E[Y - p0 - p1X]^2. The sample estimates converge in probability to these values.
What is orthogonality in a causal model? How does it differ from mean independence and independence?
Orthogonality: cov(X,u) = 0.
Mean independence: E[u|X] = 0.
Independence: u ⫫ X.
What is the relationship between the population linear regression and the conditional expectation?
If mean independence holds, the population linear regression and the conditional expectation coincide.
If instead we only have orthogonality, this may not hold. Instead, the population regression gives us the best linear approximation to the conditional expectation of Y given X.
What is the descriptive interpretation of the population linear regression coefficients?
“On average, a unit increase in X1 is associated with a b1 unit increase in Y, holding each of X2, …, Xk constant.”
What is perfect multicollinearity? Why does it lead to the failure of multiple regression?
If Xk is a perfect linear function of the other regressors, X’X will be rank-deficient and not invertible.
What are the problems associated with imperfect multicollinearity?
Imperfect collinearity is the strength of the linear relationship among the regressors. If this is high, the OLS estimator is less precise.
What is the standard error of the regression? What does it attempt to measure?
This is an estimate of √o^2_u. Informally, this estimates the average distance that the observed values fall from the regression line.
What is R^2?
R^2 measures how much of the variation in Y that is explained by the model. It is a unitless measure that always falls between 0 and 1.
What is the ‘adjusted’ R^2? Why might we prefer this to R^2?
The R^2 increases automatically when extra regressors are included, even if the variables added have no predictive power. The adjusted R^2 therefore imposes a penalty for adding additional explanatory variables to a model.
What is homoscedasticity?
Errors are homoscedastic if the variance is the same across all observations.
What factors make OLS more precise?
- Increased model fit
- Low correlation between X1 and the other regressors.
How can we test multiple linear hypotheses, such as b7 = b8 = 0?
Use an F test.
What is the correct interpretation of a confidence interval?
[a, b] is an interval such that 95% of intervals constructed in this way will contain the true value b1.
Or
[a, b] is the range of hypothesised values that a two-sided t-test will not reject at the alpha% significance level based on the current sample.