9. Assumptions and Diagnostics Flashcards

1
Q

What are the different linear model assumptions?

A

Linearity
Independence of Errors
Normality of Errors
Equal Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens if assumptions are violated?

A

Model can’t be accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it useful visualise assumptions?

A

Easier to see nature and magnitude of assumption violation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some drawbacks of using statistical methods of assessing assumptions?

A

Suggests assumptions are violated when they are actually small

This is due to statistical power and give no info on actual problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is linearity?

A

Assumes X & Y are linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if we estimate a linear relation when there isn’t one present?

A

Can result in underestimating that relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is linearity investigated?

A

Investigated via scatterplots with loess lines (single variance)

or

Component residual’s plots (when we have multiple predictors) - Closer to black line = More linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we test for non-linearity?

A

Need to know relations between each predictor and outcome are linear by controlling other predictors

Partial residuals for x value

ei + BjXij (partial linear relation xj + y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are normally distributed errors?

A

Assumes error are normally distributed about each predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are normally distributed errors investigated?

A

Investigated via QQ plots (Quantile comparisons plots)

  • Plot standardized residuals from model against theoretical expected values
  • If normally distributed = Should fall neatly on diagonal plot
  • If non-normally distributed = Impact shape
  • Can also use histograms to see distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is equal variance (Homoscedasticity) ?

A

Assumes variance is constant across values of the predictors x1, … xk + across values of the fitted values y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is it called when homoscedasticity is violated?

A

Heteroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is equal variance investigated?

A

Using residual plot

When comparing residual values vs predicted values = Should be the same difference above the line and below the line

Categorical predictors = Should show similar spread
Continuous predictors = Should be dots that follow the line closely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is independence of errors?

A

Assumes errors are not correlated with one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we test independence of errors?

A

Difficult to test, unless we know the potential source of correlation between cases

Errors are not correlated in between persons design

Can use a variant of linear model to account for independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are linear model diagnostics and what the three features?

A

Explore individual cases in context of model

Model outliers, High Leverage, High Influence

17
Q

What are model outliers?

A

Cases that have unusual outcome values given their predictor values

(Show a large difference between predicted and observed)

18
Q

What are outliers?

A

Large residuals = May have a strong influence on the model

19
Q

How do you determine an outlier?

A

Size of outlier
Unstandardized individuals (same units as DV)
yi - yi (estimated)

Fine for comparison across cases in lm but difficult to compare across DVs with different units

20
Q

What is the difference between standardised and studentized residuals?

A

Standardized residuals:

  • Unstandardized/Estimate SD (convert to z-score)
  • Calculations can include outliers when using whole data

Studentized residuals:

  • Standardized residuals w/o extreme case values of > +2 or < -2 indicate outlyingness
21
Q

What are high leverage cases?

A

Have unusual predictor value or combination of predictor values (e.g. x for away from x bar (mean))

22
Q

How do you find high leverage cases?

A

Hat values used to assess as it’s the difference between value of x and mean value of x

Mean of hat value x 2 = High leverage

23
Q

What are high influence cases?

A

Cases having a large impact on estimation of model

Have a strong effect on coefficients - e.g. if deleted a case, coefficient would change

24
Q

How do we investigate high influence cases?

A

Degree of change = one way to judge magnitude of influence

Can also consider influence via cook’s distance and DF beta

25
What is cook's distance?
The average distance y hat values will move if given case removed Different cut-off suggestions Di > 1 Dj > 4/n-k-1
26
What are the three different ways you can look at Cook's Distance in more detail (name only) ?
DFFit Dfbeta DFbeats
27
What is DFFit?
Difference between predicted outcome (Y) of case vs without case included
28
What is Dfbeta?
Difference between value of coefficient if case included vs not included This is how it varies for DFFit - focuses on coefficient
29
What is Dfbeats?
Standardised version of Dfbeta
30
How do we examine influence of SE?
SE can impact inferences Measure via COVRATIO <1 = Precision is decreased by a case (SE increases) >1 = Precision increased by case (SE decreases) COVRATIO > 1 -/+ [3(k+1)/n) = have strong influence on SE
31
What is multi-collinearity?
Correlation between factors Large correlation between predictors = Increased SE so don't want predictors to be too correlated
32
What do you do if multi-collinearity occurs?
If it happens = Combine two predictors into single composite or drop IV that is statistically redundant
33
How do you test for multicollinearity?
Variation inflation factor (VIF) - Measures how much SE(beta) is increased by predictor correlations - VIF quantities are increased by predictor inter-correlations - VIFs > 10 = Issue - want it close to 1 - Always consider influence before deleting variable
34
What are sensitivity analyses?
Checking if you get similar results, irrespective of methodological decisions Do coefficients change if including certain case
35
What if the results from sensitivity analysis are similar?
Increased confidence results x based on methodology but strength
36
If a case has a high COVRATIO value but a low dfbeta, what is the most likely reason?
It has an extreme value on x but is not a regression outlier