Chapter 3 - Linear regression Flashcards

Question 1

Q

Residuals vs Fitted plot
* Purpose
* Look For
* x- and y-axis

Answer

A

Check linearity and heteroscedasticity
Random scatter (no patterns or funnels):
Linear relationship if scatter around y = 0.
Non-linearity or missing interactions if patterns or trends.
Indicates heteroscedasticity (non-constant variance) if funnel shape.
Fitted values and Residuals

Question 2

Q

Outliers

Answer

A

Unnormal output

Question 3

Q

Leverage

Answer

A

Unnormal input. How far an observation’s predictor values are from the mean.

Question 4

Q

Q-Q Plot
* Purpose
* Look For
* x- and y-axis

Answer

A

Detect outliers and influential points
High leverage or Cook’s distance points:
Residuals are approximately normal if the points are close to a 45 degree line
Indicate non-normality (e.g., outliers at the tails) if the points deviate from the line
Quantiles from a standard normal distribution and Standardized residuals

Question 5

Q

Scale-Location Plot (Spread-Location)
* Purpose
* Look For
* x- and y-axis

Answer

A

Check homoscedasticity (equal variance) - Assesses the spread (variance) of residuals across the predictions
Horizontal spread of residuals.
Variance is constant if the spread is horizontal
Variance changes for different predicted values if the spread is - Heteroscedasticity
Square root of standardized residuals and Predicted values

Question 6

Q

Residuals vs Leverage plot
* Purpose
* Look For
* x- and y-axis

Answer

A

Detect outliers and influential points
High leverage or Cook’s distance points:
Points with high leverage: Unusual predictor values.
Points with high Cook’s distance (contours): Strongly influence the model.
Leverage and Standardized residuals

Question 7

Q

Standardized residuals

Answer

A

Residuals scaled by their standard error

Question 8

Q

How to formulate a null hypothesis?

Answer

A

H_0: Parameter = 0
The corresponing variable has no significant effect on the outcome. Compare the p-value where we want p-values less than 0.05.

Question 9

Q

If the true relation is linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?

Answer

A

Training RSS - Better, the cubic model is at least as flexible and can fit the data accurately with more degrees of freedom.
Test RSS - Worse (higher) for the cubic model, because we will likely overfit the model.

Question 10

Q

The true relation is unknown but it is not linear, how will a cubic regression compared to a linear regression model perfom in terms of training and test RSS?

Answer

A

Depends on how close it is to be linear and non-linear. The further away from linear it is, the more likely it is that the cubic model will have a better RSS. The training RSS will be better for the cubic model beacuse we have a more flexible model. The test RSS is somewhat unknown, it depends how far from linear. If it is very non-lienar to a high degree, the cubic is not enough either.

Chapter 3 - Linear regression Flashcards

(10 cards)