Chapter 3 - Linear regression Flashcards
Residuals vs Fitted plot
* Purpose
* Look For
* x- and y-axis
- Check linearity and heteroscedasticity
- Random scatter (no patterns or funnels):
Linear relationship if scatter around y = 0.
Non-linearity or missing interactions if patterns or trends.
Indicates heteroscedasticity (non-constant variance) if funnel shape. - Fitted values and Residuals
Outliers
Unnormal output
Leverage
Unnormal input. How far an observation’s predictor values are from the mean.
Q-Q Plot
* Purpose
* Look For
* x- and y-axis
- Detect outliers and influential points
- High leverage or Cook’s distance points:
Residuals are approximately normal if the points are close to a 45 degree line
Indicate non-normality (e.g., outliers at the tails) if the points deviate from the line - Quantiles from a standard normal distribution and Standardized residuals
Scale-Location Plot (Spread-Location)
* Purpose
* Look For
* x- and y-axis
- Check homoscedasticity (equal variance) - Assesses the spread (variance) of residuals across the predictions
- Horizontal spread of residuals.
Variance is constant if the spread is horizontal
Variance changes for different predicted values if the spread is - Heteroscedasticity - Square root of standardized residuals and Predicted values
Residuals vs Leverage plot
* Purpose
* Look For
* x- and y-axis
- Detect outliers and influential points
- High leverage or Cook’s distance points:
- Points with high leverage: Unusual predictor values.
Points with high Cook’s distance (contours): Strongly influence the model. - Leverage and Standardized residuals
Standardized residuals
Residuals scaled by their standard error
How to formulate a null hypothesis?
H_0: Parameter = 0
The corresponing variable has no significant effect on the outcome. Compare the p-value where we want p-values less than 0.05.
If the true relation is linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?
Training RSS - Better, the cubic model is at least as flexible and can fit the data accurately with more degrees of freedom.
Test RSS - Worse (higher) for the cubic model, because we will likely overfit the model.
The true relation is unknown but it is not linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?
Depends on how close it is to be linear and non-linear. The further away from linear it is, the more likely it is that the cubic model will have a better RSS. The training RSS will be better for the cubic model beacuse we have a more flexible model. The test RSS is somewhat unknown, it depends how far from linear. If it is very non-lienar to a high degree, the cubic is not enough either.