QA9 - Regression Diagnostics Flashcards

Question 1

Q

Explain how to test whether a regression is affected by heteroskedasticity

Answer

A

Heteroskedasticity is where the variance of error systematically varies with one or more explanatory variables

Estimate the model and compute the residuals ei
Regress ei against constant, all variables, and cross product of all variables
(ei = v0 + v1 * Xi1 + v2 * Xi2 + v3 * Xi1^2 + v4 * Xi1 * Xi2 + v5 * Xi2^2 + h)
If homoscedastic , W0: v1 = v2 = .. = vn = 0

Question 2

Q

Describe approaches to using heteroskedastic data

Answer

A

ignore them and use heteroskedastic robust residuals for hypothesis testing
transform data to remove (log for example)
use weighted least squares for parameter estimation

Question 3

Q

Characterise multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity

Answer

A

Multicollinearity is where one or more explanatory variables can be substantially explained by the others. When modelling this means jointly significant variables may have very small individual f-stats.

Perfect collinearity is where one of the variables is perfectly described by another

Question 4

Q

Describe the consequences of excluding a relevant explanatory variable from a model and contrast those with the consequences of including a relevant regressor

Answer

A

Omitting important variable means increasing bias in the model

Including irrelevant variable reduces adjusted R^2 due to penalty factor

Question 5

Q

Explain two model selection procedures and how these relate to the bias variance tradeoff

Answer

A

Bias-variance is balancing large models with low bias but less precise parameters, with high bias but less estimation error models

General-to-specific:
- start with model with lots of parameters
- remove variable with smallest T stat that is insignificant
- repeat until all variables significant
M-fold cross-validation
- split data into m blocks, fit all candidate models on m - 1 blocks
- calculate the residuals on the unused block
- take the model with smallest residual

Question 6

Q

Describe methods for identifying outliers and their impact

Answer

A

Found using cooks distance, find residuals of model with potential outlier dropped, if Dj > 1 then outlier

QA9 - Regression Diagnostics Flashcards

(6 cards)