The BEAST of Bias Flashcards

1
Q

What are outliers/influential cases?

A

An atypical case, which influences the beta values

Can pull the model down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you detect outliers/influential cases?

A

Looking at residuals/deviation - predicted value will be far away from the observed value so large residual
Graphs - boxplots, histograms
Cooks distance - measures influence of a single case on the model, if bigger than 1 = problem
Standardised residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why can’t you detect an influential case using residuals?

A

Influential case has been so influential, that it has dragged it down so much so the residual is small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the assumptions of the linear model?

A

Linearity and additivity
Spherical residuals - independent errors and homescadastic errors
Normality of something - residuals, sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does linearity and additivity refer too?

A

The relationship between the predictor and outcome is linear, otherwise used the wrong model
Predictors should be added in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to check for linearity and additivity?

A

Graphs

Equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Most important assumption

A

Linearity and additivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does spherical errors refer too?

A

Errors should be independent - not related to each other (autocorrelation)

Errors should be homoscadistic - variances should be consistent at different levels of the predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does violation of spherical errors mean?

A

B’s unviolated but not optimal

SE incorrect, so P and CI’s will be incorrect as they use SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you check for spherical errors?

A

Levene’s test - if significant, means HOV
Variances - similar size
Variance ratio - biggest variance divided by smallest
Graph of standardised (zresid) against predicted (spred) residuals
Durbin Watson - between 0-4 if independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do graphs of zresid against zpred show?

A

Funnel shape - heteroscadistic

Sausage shape - non linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does normally distributed refer too?

A

Normality of residuals - diff between observed and predicted value, b will be unbiased but other methods better

Normality of sampling distribution - 1.96 comes from normal dist so intervals will be wrong. P values associated with B assumes a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the central limit theorm?

A

As long as the sample is big enough, sampling distribution will be normal
only worry about normality in small samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Exploring normality

A

Graphs - histograms, box plots, pp-qq plots (if S shape, problem)

Numbers - skew and kurtosis, want them to be 0 (if above 2/2.5 = skewed)

Test: kolmogorov-smirnov K-S test - dont quant to use this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ways of correcting problems

A

SD based trim (twin values outside of 2SD from mean) - really bad idea

Transform data - creates more problems

Winsorizing - substitute outliers with hight value which isn’t an outlier

Robust estimation - 20% trim, M estimators, the bootstrap or adjust SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is bootstrapping?

A

Takes samples loads of times, order them and uses these to determine limits between which 95% of scores fall, creates CI based on data not theory, don’t need to rely on normality

17
Q

Standardised residuals

A

95% of standardised residuals should lie between + or - 2
99 should lie between + or - 2.5
if 3 or more, outlier

18
Q

How to work out how many standardised residuals lie between 2/2.5?

A

Work out either 5% of the sample size or 1%, to see how many standardised residuals you can have

19
Q

What is multicollinearity?

A

Checking if 2 predictors correlate to highly

20
Q

How to check for multicollinearity?

A

Tolerance should be bigger than 0.2 or VIF smaller than 10

21
Q

When should you worry about normality?

A

With small samples