Chapter 3.2 Multicollinearity/Homoscedasticity Flashcards

Question 1

Q

Explain multicollinearity.

Answer

A

–> no linear dependency between predictors
–> we can use the rank of the observation matrix to determine that
–> if two predictors are correlated, might as well omit one of them

Question 2

Q

What’s the observation matrix?

How do we know there’s an exact linear relationship? What does this also say?

Answer

A

It has the shape n x (p+1)
with (p+1) < n

if Rank (X) < p+1, X is singular–> i.e. can’t calculate a closed-form solution

Question 3

Q

… leads to issues wrt. the significance of predictors

Answer

A

high correlation between independent variables

Question 4

Q

A basic check of multicollinearity is to …
* … indicate problems.
− Large means greater than the correlations between predictors and response.
* It is possible that the pairwise correlations are small, and yet a …

Answer

A

calculate the correlation coefficient for each pair of predictor variables.

Large correlations (both positive and negative)

linear dependence exists among three or even more variables.

Question 5

Q

Alternatively, we can use the Variance Inflation Factor (VIF).

What would a VIF of 10 mean?

Because …, removing the predictor in question should not cause a substantive decrease in overall 𝑅^2.
The rule of thumb is to remove variables with VIF scores greater than 10.

Answer

A

VIF = (1-Rk^2)^-1
k is the dependent variable

A VIF of 10 ==> Rk = 90%

i.e. 90% of the variance in the predictor in question can be explained by other independent variables.

so much of the variance is captured elsewhere

Question 6

Q

Consequence - Non-Significance
If a variable has a non-significant 𝑡-value, then either
* ….
* the variable is related to the response, but it is not required in the regression because …

The usual remedy is to drop one or more variables from the model.

Answer

A

the variable is not related to the response, or
(Small 𝑡-value, small VIF, small correlation with response)

it is strongly related to a third variable that is in the regression, so we don’t need both
(Small 𝑡-value, high VIF, high correlation with response)

Question 7

Q

Homoscedasticity
When the requirement of a …, we have homoscedasticity.
* If the data is not homoscedastic, a different estimator (e.g., weighted least squares) might be better than OLS.
* We also assume residuals to be normally distributed.

Answer

A

constant variance is not violated

The spread of the data points
does not change much.

Question 8

Q

Explain heteroscedasticity.

Answer

A

-requirement of a constant variance is violated
- leads to biased error terms and p-values of significance tests

Question 9

Q

Explain Glejser test.

Answer

A

slide 28 (to review)

Question 10

Q

Explain White test.

Answer

A

slide 29 (to review)