Chapter 3.2 Multicollinearity/Homoscedasticity Flashcards

1
Q

Explain multicollinearity.

A

–> no linear dependency between predictors
–> we can use the rank of the observation matrix to determine that
–> if two predictors are correlated, might as well omit one of them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the observation matrix?

How do we know there’s an exact linear relationship? What does this also say?

A

It has the shape n x (p+1)
with (p+1) < n

if Rank (X) < p+1, X is singular–> i.e. can’t calculate a closed-form solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

… leads to issues wrt. the significance of predictors

A

high correlation between independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A basic check of multicollinearity is to …
* … indicate problems.
− Large means greater than the correlations between predictors and response.
* It is possible that the pairwise correlations are small, and yet a …

A

calculate the correlation coefficient for each pair of predictor variables.

Large correlations (both positive and negative)

linear dependence exists among three or even more variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Alternatively, we can use the Variance Inflation Factor (VIF).

What would a VIF of 10 mean?

Because …, removing the predictor in question should not cause a substantive decrease in overall 𝑅^2.
The rule of thumb is to remove variables with VIF scores greater than 10.

A

VIF = (1-Rk^2)^-1
k is the dependent variable

A VIF of 10 ==> Rk = 90%

i.e. 90% of the variance in the predictor in question can be explained by other independent variables.

so much of the variance is captured elsewhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consequence - Non-Significance
If a variable has a non-significant 𝑡-value, then either
* ….
* the variable is related to the response, but it is not required in the regression because …

The usual remedy is to drop one or more variables from the model.

A

the variable is not related to the response, or
(Small 𝑡-value, small VIF, small correlation with response)

it is strongly related to a third variable that is in the regression, so we don’t need both
(Small 𝑡-value, high VIF, high correlation with response)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Homoscedasticity
When the requirement of a …, we have homoscedasticity.
* If the data is not homoscedastic, a different estimator (e.g., weighted least squares) might be better than OLS.
* We also assume residuals to be normally distributed.

A

constant variance is not violated

The spread of the data points
does not change much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain heteroscedasticity.

A

-requirement of a constant variance is violated
- leads to biased error terms and p-values of significance tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain Glejser test.

A

slide 28 (to review)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain White test.

A

slide 29 (to review)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly