Linear Model Evaluation/Diagnostics Flashcards

1
Q

What is the R^2 statistic?

A

The proportion of variance explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is R^2 not always the best indicator of predictive power?

A

An overfitted model will have great R^2 statistic but poor predictive power
High variance, then low R^2 score even if correct model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the assumptions we make about the errors in a model?

A
  • well described by a normal distribution
  • have constant variance
  • are independent of each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we expect to see when remove the signal from the model?

A

Residuals that are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two qualitative ways to assess normality?

A
  • look at histogram of the residuals

- a QQ Norm plot of the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two quantitative ways to assess normality?

A
  • Wilk Shapiro test for Normality

- Kolmogorov Smirnov test for Normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe QQ Norm plots

A
  • Plot the quantiles of two sets of data against each other
  • If there shapes are similar and roughly normally distributed, tend to get a straight line
  • plots the residuals sorted in order, against the standardised quantiles for the distribution of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the ith point of QQ Norm plots typically given by?

A

i/(n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the Shapiro Wilks Test

A

Produces a statistic which relates to the straightness of the QQ plot
Null hypothesis, H0: data are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What will happen if the assumption that the errors are independent is violated?

A

Standard errors and p values are systematically too small and risk drawing the wrong conclusions about model covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can the null hypothesis of uncorrelated errors be formally tested by?

A

Durbin Watson test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can independence also be violated by?

A

Philosophical ways, like pseudoreplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the practical consequence of falsely assuming independence?

A

Can conclude that one or more unrelated variables are genuinely related to the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What can we do if we have correlation in the residuals?

A
  • Ignore the correlation in residuals
  • Try to remove the correlation in model residuals by sub-setting the data
  • Account for the correlation using, a generalized least squares model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we use partial residual plots for?

A

To address if non-linearity of predictors is caused by either predictor or another unknown one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do partial residual plots show?

A

Residuals and relationships between y and individual x with adjustments for other x’s

17
Q

What are the useful diagnostic properties of partial residuals?

A
  • The slope of the line is the regression coefficent
  • The extent of the scatter tells us about the support for the function
  • We can identify large residuals
  • Curved plots signal non-linear relationships
18
Q

What do we do if we have error distribution shape problems?

A
  • Try transforming things to address the distributional shape problems
  • Move to other models like generalised linear models
  • Bootstrap your way to glory
19
Q

What do we do if we have independence problems?

A
  • Move to other models/methods like mixed models (LMM, GLMM) or used generalised estimating equations
20
Q

What do we do if we have signal problems?

A
  • Use complex linear models or generalised additive models (GAM)
21
Q

How do we bootstrap?

A
  1. We make a new dataset of the same dimension, by sampling the rows of the data with replacement
  2. We do this a lot and fit models at each stage
  3. This shows how roughly things might change if we were to have another sample of data