QE 2-4: Regression analysis Flashcards

Question 1

Q

What is the difference between ‘causal analysis’ and ‘descriptive analysis’?

Answer

A

Descriptive analysis asks how variables are correlated, and how well we can predict Y from observing X.

Causal analysis makes further counterfactual claims, asking how Y would change if we were to change X.

Question 2

Q

What is the sample OLS problem?

Answer

A

The sample OLS problem finds the ‘line of best fit’ through a set of points that minimises the squared distance from the line to the points.

Question 3

Q

What is the population regression problem? How does it relate to the sample OLS estimates?

Answer

A

The population regression problem minimises E[Y - p0 - p1X]^2. The sample estimates converge in probability to these values.

Question 4

Q

What is orthogonality in a causal model? How does it differ from mean independence and independence?

Answer

A

Orthogonality: cov(X,u) = 0.
Mean independence: E[u|X] = 0.
Independence: u ⫫ X.

Question 5

Q

What is the relationship between the population linear regression and the conditional expectation?

Answer

A

If mean independence holds, the population linear regression and the conditional expectation coincide.

If instead we only have orthogonality, this may not hold. Instead, the population regression gives us the best linear approximation to the conditional expectation of Y given X.

Question 6

Q

What is the descriptive interpretation of the population linear regression coefficients?

Answer

A

“On average, a unit increase in X1 is associated with a b1 unit increase in Y, holding each of X2, …, Xk constant.”

Question 7

Q

What is perfect multicollinearity? Why does it lead to the failure of multiple regression?

Answer

A

If Xk is a perfect linear function of the other regressors, X’X will be rank-deficient and not invertible.

Question 8

Q

What are the problems associated with imperfect multicollinearity?

Answer

A

Imperfect collinearity is the strength of the linear relationship among the regressors. If this is high, the OLS estimator is less precise.

Question 9

Q

What is the standard error of the regression? What does it attempt to measure?

Answer

A

This is an estimate of √o^2_u. Informally, this estimates the average distance that the observed values fall from the regression line.

Question 10

Q

What is R^2?

Answer

A

R^2 measures how much of the variation in Y that is explained by the model. It is a unitless measure that always falls between 0 and 1.

Question 11

Q

What is the ‘adjusted’ R^2? Why might we prefer this to R^2?

Answer

A

The R^2 increases automatically when extra regressors are included, even if the variables added have no predictive power. The adjusted R^2 therefore imposes a penalty for adding additional explanatory variables to a model.

Question 12

Q

What is homoscedasticity?

Answer

A

Errors are homoscedastic if the variance is the same across all observations.

Question 13

Q

What factors make OLS more precise?

Answer

A

Increased model fit
Low correlation between X1 and the other regressors.

Question 14

Q

How can we test multiple linear hypotheses, such as b7 = b8 = 0?

Answer

A

Use an F test.

Question 15

Q

What is the correct interpretation of a confidence interval?

Answer

A

[a, b] is an interval such that 95% of intervals constructed in this way will contain the true value b1.

Or

[a, b] is the range of hypothesised values that a two-sided t-test will not reject at the alpha% significance level based on the current sample.

Question 16

Q

What is the difference between type 1 and type 2 errors?

Answer

Study These Flashcards

A

Type I error = rejecting a true null. Controlled by the significance level of the test.

Type II = failing to reject a false null.

Question 17

Q

What is the ‘power-size tradeoff’?

Answer

Study These Flashcards

A

Power is the probability of rejecting the null hypothesis, if it is false.

Size is the probability of incorrectly rejecting the null hypothesis, if it is true.

There is a tradeoff between power and size. Increasing the power of a test (ie making it more likely that we correctly reject false nulls) will almost always also make us more likely to reject true nulls, reducing the size.