QA9 - Regression Diagnostics Flashcards

1
Q

Explain how to test whether a regression is affected by heteroskedasticity

A

Heteroskedasticity is where the variance of error systematically varies with one or more explanatory variables

  1. Estimate the model and compute the residuals ei
  2. Regress ei against constant, all variables, and cross product of all variables
    (ei = v0 + v1 * Xi1 + v2 * Xi2 + v3 * Xi1^2 + v4 * Xi1 * Xi2 + v5 * Xi2^2 + h)
  3. If homoscedastic , W0: v1 = v2 = .. = vn = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe approaches to using heteroskedastic data

A
  1. ignore them and use heteroskedastic robust residuals for hypothesis testing
  2. transform data to remove (log for example)
  3. use weighted least squares for parameter estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Characterise multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity

A

Multicollinearity is where one or more explanatory variables can be substantially explained by the others. When modelling this means jointly significant variables may have very small individual f-stats.

Perfect collinearity is where one of the variables is perfectly described by another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the consequences of excluding a relevant explanatory variable from a model and contrast those with the consequences of including a relevant regressor

A

Omitting important variable means increasing bias in the model

Including irrelevant variable reduces adjusted R^2 due to penalty factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain two model selection procedures and how these relate to the bias variance tradeoff

A

Bias-variance is balancing large models with low bias but less precise parameters, with high bias but less estimation error models

  1. General-to-specific:
    - start with model with lots of parameters
    - remove variable with smallest T stat that is insignificant
    - repeat until all variables significant
  2. M-fold cross-validation
    - split data into m blocks, fit all candidate models on m - 1 blocks
    - calculate the residuals on the unused block
    - take the model with smallest residual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe methods for identifying outliers and their impact

A

Found using cooks distance, find residuals of model with potential outlier dropped, if Dj > 1 then outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly