Advanced Topics Flashcards

1
Q

Name 4 assumptions of regression

A
  1. Linearity
  2. Normality of residuals
  3. High influence points
  4. Colinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you test the linear regression assumption?

A

Plot your fitted-Y against observed-Y. Residuals should appear symetrical along fitted.
Sig p-value = probably not linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you test the regression assumption: residuals are normal?

A

This refers to standardised residuals. To standardise first convert to z-score. The run Shapiro-Wilk or QQ on standardised residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an outlier in regression?

A

A data point that has a large residual.
i.e a large distance between data point and regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a high leverage point?

A

An observation that has an extreme or unusual value.
Far along the x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is more dangerious to ones regression, an outlier or a high leverage point?

A

Nether are particularly dangerious in and of itself. However an observation that is BOTH an outlier and a high leverage point is dangerous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is leverage, and how is it calculated? how are outliers calculated.

A

Leverage is calculated using the hat value and tests each data point to see how much it ‘controls’ the regression line. Outliers can be see by plotting standardised residauls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is (Cooks distance, which = 4) a problem?

A

If a data point is more than 2k / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the biggest problem with have co-linear data?

A

Can massively inflation of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why isn’t choosing R^2 the best way to choose our model?

A

Because models with more predictors will always have more variance. Some models that are too complex will overfit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are two ways to penalise models for additional paramters?

A

Adjusted R-squared.
AIC and BIC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly