9. Assumptions and Diagnostics Flashcards by Emily Etty

What are the different linear model assumptions?

Linearity
Independence of Errors
Normality of Errors
Equal Variance

How well did you know this?

Not at all

Perfectly

What happens if assumptions are violated?

Model can’t be accurate

How well did you know this?

Not at all

Perfectly

Why is it useful visualise assumptions?

Easier to see nature and magnitude of assumption violation

How well did you know this?

Not at all

Perfectly

What are some drawbacks of using statistical methods of assessing assumptions?

Suggests assumptions are violated when they are actually small

This is due to statistical power and give no info on actual problem

How well did you know this?

Not at all

Perfectly

What is linearity?

Assumes X & Y are linear

How well did you know this?

Not at all

Perfectly

What happens if we estimate a linear relation when there isn’t one present?

Can result in underestimating that relation

How well did you know this?

Not at all

Perfectly

How is linearity investigated?

Investigated via scatterplots with loess lines (single variance)

Component residual’s plots (when we have multiple predictors) - Closer to black line = More linear

How well did you know this?

Not at all

Perfectly

How do we test for non-linearity?

Need to know relations between each predictor and outcome are linear by controlling other predictors

Partial residuals for x value

ei + BjXij (partial linear relation xj + y)

How well did you know this?

Not at all

Perfectly

What are normally distributed errors?

Assumes error are normally distributed about each predicted value

How well did you know this?

Not at all

Perfectly

How are normally distributed errors investigated?

Investigated via QQ plots (Quantile comparisons plots)

Plot standardized residuals from model against theoretical expected values
If normally distributed = Should fall neatly on diagonal plot
If non-normally distributed = Impact shape
Can also use histograms to see distribution

How well did you know this?

Not at all

Perfectly

What is equal variance (Homoscedasticity) ?

Assumes variance is constant across values of the predictors x1, … xk + across values of the fitted values y

How well did you know this?

Not at all

Perfectly

What is it called when homoscedasticity is violated?

Heteroscedasticity

How well did you know this?

Not at all

Perfectly

How is equal variance investigated?

Using residual plot

When comparing residual values vs predicted values = Should be the same difference above the line and below the line

Categorical predictors = Should show similar spread
Continuous predictors = Should be dots that follow the line closely

How well did you know this?

Not at all

Perfectly

What is independence of errors?

Assumes errors are not correlated with one another

How well did you know this?

Not at all

Perfectly

How do we test independence of errors?

Difficult to test, unless we know the potential source of correlation between cases

Errors are not correlated in between persons design

Can use a variant of linear model to account for independence

How well did you know this?

Not at all

Perfectly

What are linear model diagnostics and what the three features?

Study These Flashcards

Explore individual cases in context of model

Model outliers, High Leverage, High Influence

What are model outliers?

Study These Flashcards

Cases that have unusual outcome values given their predictor values

(Show a large difference between predicted and observed)

What are outliers?

Study These Flashcards

Large residuals = May have a strong influence on the model

How do you determine an outlier?

Study These Flashcards

Size of outlier
Unstandardized individuals (same units as DV)
yi - yi (estimated)

Fine for comparison across cases in lm but difficult to compare across DVs with different units

What is the difference between standardised and studentized residuals?

Study These Flashcards

Standardized residuals:

Unstandardized/Estimate SD (convert to z-score)
Calculations can include outliers when using whole data

Studentized residuals:

Standardized residuals w/o extreme case values of > +2 or < -2 indicate outlyingness

What are high leverage cases?

Study These Flashcards

Have unusual predictor value or combination of predictor values (e.g. x for away from x bar (mean))

How do you find high leverage cases?

Study These Flashcards

Hat values used to assess as it’s the difference between value of x and mean value of x

Mean of hat value x 2 = High leverage

What are high influence cases?

Study These Flashcards

Cases having a large impact on estimation of model

Have a strong effect on coefficients - e.g. if deleted a case, coefficient would change

How do we investigate high influence cases?

Study These Flashcards

Degree of change = one way to judge magnitude of influence

Can also consider influence via cook’s distance and DF beta

What is cook's distance?

The average distance y hat values will move if given case removed Different cut-off suggestions Di > 1 Dj > 4/n-k-1

What are the three different ways you can look at Cook's Distance in more detail (name only) ?

DFFit Dfbeta DFbeats

What is DFFit?

Difference between predicted outcome (Y) of case vs without case included

What is Dfbeta?

Difference between value of coefficient if case included vs not included This is how it varies for DFFit - focuses on coefficient

What is Dfbeats?

Standardised version of Dfbeta

How do we examine influence of SE?

SE can impact inferences Measure via COVRATIO <1 = Precision is decreased by a case (SE increases) >1 = Precision increased by case (SE decreases) COVRATIO > 1 -/+ [3(k+1)/n) = have strong influence on SE

What is multi-collinearity?

Correlation between factors Large correlation between predictors = Increased SE so don't want predictors to be too correlated

What do you do if multi-collinearity occurs?

If it happens = Combine two predictors into single composite or drop IV that is statistically redundant

How do you test for multicollinearity?

Variation inflation factor (VIF) - Measures how much SE(beta) is increased by predictor correlations - VIF quantities are increased by predictor inter-correlations - VIFs > 10 = Issue - want it close to 1 - Always consider influence before deleting variable

What are sensitivity analyses?

Checking if you get similar results, irrespective of methodological decisions Do coefficients change if including certain case

What if the results from sensitivity analysis are similar?

Increased confidence results x based on methodology but strength

If a case has a high COVRATIO value but a low dfbeta, what is the most likely reason?

It has an extreme value on x but is not a regression outlier

9. Assumptions and Diagnostics Flashcards

(36 cards)