QE 2-4: Regression analysis Flashcards

1
Q

What is the difference between ‘causal analysis’ and ‘descriptive analysis’?

A

Descriptive analysis asks how variables are correlated, and how well we can predict Y from observing X.

Causal analysis makes further counterfactual claims, asking how Y would change if we were to change X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the sample OLS problem?

A

The sample OLS problem finds the ‘line of best fit’ through a set of points that minimises the squared distance from the line to the points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the population regression problem? How does it relate to the sample OLS estimates?

A

The population regression problem minimises E[Y - p0 - p1X]^2. The sample estimates converge in probability to these values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is orthogonality in a causal model? How does it differ from mean independence and independence?

A

Orthogonality: cov(X,u) = 0.
Mean independence: E[u|X] = 0.
Independence: u ⫫ X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the relationship between the population linear regression and the conditional expectation?

A

If mean independence holds, the population linear regression and the conditional expectation coincide.

If instead we only have orthogonality, this may not hold. Instead, the population regression gives us the best linear approximation to the conditional expectation of Y given X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the descriptive interpretation of the population linear regression coefficients?

A

“On average, a unit increase in X1 is associated with a b1 unit increase in Y, holding each of X2, …, Xk constant.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is perfect multicollinearity? Why does it lead to the failure of multiple regression?

A

If Xk is a perfect linear function of the other regressors, X’X will be rank-deficient and not invertible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the problems associated with imperfect multicollinearity?

A

Imperfect collinearity is the strength of the linear relationship among the regressors. If this is high, the OLS estimator is less precise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the standard error of the regression? What does it attempt to measure?

A

This is an estimate of √o^2_u. Informally, this estimates the average distance that the observed values fall from the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is R^2?

A

R^2 measures how much of the variation in Y that is explained by the model. It is a unitless measure that always falls between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the ‘adjusted’ R^2? Why might we prefer this to R^2?

A

The R^2 increases automatically when extra regressors are included, even if the variables added have no predictive power. The adjusted R^2 therefore imposes a penalty for adding additional explanatory variables to a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is homoscedasticity?

A

Errors are homoscedastic if the variance is the same across all observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What factors make OLS more precise?

A
  • Increased model fit
  • Low correlation between X1 and the other regressors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we test multiple linear hypotheses, such as b7 = b8 = 0?

A

Use an F test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the correct interpretation of a confidence interval?

A

[a, b] is an interval such that 95% of intervals constructed in this way will contain the true value b1.

Or

[a, b] is the range of hypothesised values that a two-sided t-test will not reject at the alpha% significance level based on the current sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between type 1 and type 2 errors?

A

Type I error = rejecting a true null. Controlled by the significance level of the test.

Type II = failing to reject a false null.

17
Q

What is the ‘power-size tradeoff’?

A

Power is the probability of rejecting the null hypothesis, if it is false.

Size is the probability of incorrectly rejecting the null hypothesis, if it is true.

There is a tradeoff between power and size. Increasing the power of a test (ie making it more likely that we correctly reject false nulls) will almost always also make us more likely to reject true nulls, reducing the size.

18
Q

What types of nonlinearity can we incorporate into multiple regression? What types can we not?

A

Regressions must be ‘linear in the parameters’.

19
Q

Why include dummy terms in a regression?

A

This allows different sub-groups to have a different intercept parameter to each other.

20
Q

What model specification should be used to estimate elasticities?

A

A log-log model.

21
Q

How should a linear-log model Y = b0 + b1 log X be interpreted?

A

A 1% increase in X has a 0.01b1 effect on Y.