Chapter 3 - Linear regression Flashcards

1
Q

Residuals vs Fitted plot
* Purpose
* Look For
* x- and y-axis

A
  • Check linearity and heteroscedasticity
  • Random scatter (no patterns or funnels):
    Linear relationship if scatter around y = 0.
    Non-linearity or missing interactions if patterns or trends.
    Indicates heteroscedasticity (non-constant variance) if funnel shape.
  • Fitted values and Residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers

A

Unnormal output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Leverage

A

Unnormal input. How far an observation’s predictor values are from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q-Q Plot
* Purpose
* Look For
* x- and y-axis

A
  • Detect outliers and influential points
  • High leverage or Cook’s distance points:
    Residuals are approximately normal if the points are close to a 45 degree line
    Indicate non-normality (e.g., outliers at the tails) if the points deviate from the line
  • Quantiles from a standard normal distribution and Standardized residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scale-Location Plot (Spread-Location)
* Purpose
* Look For
* x- and y-axis

A
  • Check homoscedasticity (equal variance) - Assesses the spread (variance) of residuals across the predictions
  • Horizontal spread of residuals.
    Variance is constant if the spread is horizontal
    Variance changes for different predicted values if the spread is - Heteroscedasticity
  • Square root of standardized residuals and Predicted values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Residuals vs Leverage plot
* Purpose
* Look For
* x- and y-axis

A
  • Detect outliers and influential points
  • High leverage or Cook’s distance points:
  • Points with high leverage: Unusual predictor values.
    Points with high Cook’s distance (contours): Strongly influence the model.
  • Leverage and Standardized residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standardized residuals

A

Residuals scaled by their standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to formulate a null hypothesis?

A

H_0: Parameter = 0
The corresponing variable has no significant effect on the outcome. Compare the p-value where we want p-values less than 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If the true relation is linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?

A

Training RSS - Better, the cubic model is at least as flexible and can fit the data accurately with more degrees of freedom.
Test RSS - Worse (higher) for the cubic model, because we will likely overfit the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The true relation is unknown but it is not linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?

A

Depends on how close it is to be linear and non-linear. The further away from linear it is, the more likely it is that the cubic model will have a better RSS. The training RSS will be better for the cubic model beacuse we have a more flexible model. The test RSS is somewhat unknown, it depends how far from linear. If it is very non-lienar to a high degree, the cubic is not enough either.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly