9. Assumptions and Diagnostics Flashcards

1
Q

What are the different linear model assumptions?

A

Linearity
Independence of Errors
Normality of Errors
Equal Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens if assumptions are violated?

A

Model can’t be accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it useful visualise assumptions?

A

Easier to see nature and magnitude of assumption violation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some drawbacks of using statistical methods of assessing assumptions?

A

Suggests assumptions are violated when they are actually small

This is due to statistical power and give no info on actual problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is linearity?

A

Assumes X & Y are linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if we estimate a linear relation when there isn’t one present?

A

Can result in underestimating that relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is linearity investigated?

A

Investigated via scatterplots with loess lines (single variance)

or

Component residual’s plots (when we have multiple predictors) - Closer to black line = More linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we test for non-linearity?

A

Need to know relations between each predictor and outcome are linear by controlling other predictors

Partial residuals for x value

ei + BjXij (partial linear relation xj + y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are normally distributed errors?

A

Assumes error are normally distributed about each predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are normally distributed errors investigated?

A

Investigated via QQ plots (Quantile comparisons plots)

  • Plot standardized residuals from model against theoretical expected values
  • If normally distributed = Should fall neatly on diagonal plot
  • If non-normally distributed = Impact shape
  • Can also use histograms to see distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is equal variance (Homoscedasticity) ?

A

Assumes variance is constant across values of the predictors x1, … xk + across values of the fitted values y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is it called when homoscedasticity is violated?

A

Heteroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is equal variance investigated?

A

Using residual plot

When comparing residual values vs predicted values = Should be the same difference above the line and below the line

Categorical predictors = Should show similar spread
Continuous predictors = Should be dots that follow the line closely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is independence of errors?

A

Assumes errors are not correlated with one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we test independence of errors?

A

Difficult to test, unless we know the potential source of correlation between cases

Errors are not correlated in between persons design

Can use a variant of linear model to account for independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are linear model diagnostics and what the three features?

A

Explore individual cases in context of model

Model outliers, High Leverage, High Influence

17
Q

What are model outliers?

A

Cases that have unusual outcome values given their predictor values

(Show a large difference between predicted and observed)

18
Q

What are outliers?

A

Large residuals = May have a strong influence on the model

19
Q

How do you determine an outlier?

A

Size of outlier
Unstandardized individuals (same units as DV)
yi - yi (estimated)

Fine for comparison across cases in lm but difficult to compare across DVs with different units

20
Q

What is the difference between standardised and studentized residuals?

A

Standardized residuals:

  • Unstandardized/Estimate SD (convert to z-score)
  • Calculations can include outliers when using whole data

Studentized residuals:

  • Standardized residuals w/o extreme case values of > +2 or < -2 indicate outlyingness
21
Q

What are high leverage cases?

A

Have unusual predictor value or combination of predictor values (e.g. x for away from x bar (mean))

22
Q

How do you find high leverage cases?

A

Hat values used to assess as it’s the difference between value of x and mean value of x

Mean of hat value x 2 = High leverage

23
Q

What are high influence cases?

A

Cases having a large impact on estimation of model

Have a strong effect on coefficients - e.g. if deleted a case, coefficient would change

24
Q

How do we investigate high influence cases?

A

Degree of change = one way to judge magnitude of influence

Can also consider influence via cook’s distance and DF beta

25
Q

What is cook’s distance?

A

The average distance y hat values will move if given case removed

Different cut-off suggestions

Di > 1
Dj > 4/n-k-1

26
Q

What are the three different ways you can look at Cook’s Distance in more detail (name only) ?

A

DFFit
Dfbeta
DFbeats

27
Q

What is DFFit?

A

Difference between predicted outcome (Y) of case vs without case included

28
Q

What is Dfbeta?

A

Difference between value of coefficient if case included vs not included

This is how it varies for DFFit - focuses on coefficient

29
Q

What is Dfbeats?

A

Standardised version of Dfbeta

30
Q

How do we examine influence of SE?

A

SE can impact inferences

Measure via COVRATIO

<1 = Precision is decreased by a case (SE increases)
>1 = Precision increased by case (SE decreases)

COVRATIO > 1 -/+ [3(k+1)/n) = have strong influence on SE

31
Q

What is multi-collinearity?

A

Correlation between factors

Large correlation between predictors = Increased SE so don’t want predictors to be too correlated

32
Q

What do you do if multi-collinearity occurs?

A

If it happens = Combine two predictors into single composite or drop IV that is statistically redundant

33
Q

How do you test for multicollinearity?

A

Variation inflation factor (VIF) - Measures how much SE(beta) is increased by predictor correlations

  • VIF quantities are increased by predictor inter-correlations
  • VIFs > 10 = Issue - want it close to 1
  • Always consider influence before deleting variable
34
Q

What are sensitivity analyses?

A

Checking if you get similar results, irrespective of methodological decisions

Do coefficients change if including certain case

35
Q

What if the results from sensitivity analysis are similar?

A

Increased confidence results x based on methodology but strength

36
Q

If a case has a high COVRATIO value but a low dfbeta, what is the most likely reason?

A

It has an extreme value on x but is not a regression outlier