Advanced Topics Flashcards

Question 1

Q

Name 4 assumptions of regression

Answer

A

Linearity
Normality of residuals
High influence points
Colinearity

Question 2

Q

How do you test the linear regression assumption?

Answer

A

Plot your fitted-Y against observed-Y. Residuals should appear symetrical along fitted.
Sig p-value = probably not linear

Question 3

Q

How do you test the regression assumption: residuals are normal?

Answer

A

This refers to standardised residuals. To standardise first convert to z-score. The run Shapiro-Wilk or QQ on standardised residuals

Question 4

Q

What is an outlier in regression?

Answer

A

A data point that has a large residual.
i.e a large distance between data point and regression line.

Question 5

Q

What is a high leverage point?

Answer

A

An observation that has an extreme or unusual value.
Far along the x-axis.

Question 6

Q

What is more dangerious to ones regression, an outlier or a high leverage point?

Answer

A

Nether are particularly dangerious in and of itself. However an observation that is BOTH an outlier and a high leverage point is dangerous.

Question 7

Q

What is leverage, and how is it calculated? how are outliers calculated.

Answer

A

Leverage is calculated using the hat value and tests each data point to see how much it ‘controls’ the regression line. Outliers can be see by plotting standardised residauls

Question 8

Q

When is (Cooks distance, which = 4) a problem?

Answer

A

If a data point is more than 2k / n

Question 9

Q

What is the biggest problem with have co-linear data?

Answer

A

Can massively inflation of variance.

Question 10

Q

Why isn’t choosing R^2 the best way to choose our model?

Answer

A

Because models with more predictors will always have more variance. Some models that are too complex will overfit.

Question 11

Q

What are two ways to penalise models for additional paramters?

Answer

A

Adjusted R-squared.
AIC and BIC

Advanced Topics Flashcards

(11 cards)