Topic 3: Regression Diagnostics Flashcards

1
Q

linear regression assumptions

A
  • linearity
  • normality
  • homoscedasticity
  • independence
  • outliers
  • multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

linearity

A

the relationship between x and y is linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

normality

A

the error term follows a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

homoscedasticity

A

the error term has a mean 0 & a constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

independence

A

the error terms are not related to each otehr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

outliers

A

there are no outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

multicollinearity

A

there are no high correlations among IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

testing normaltiy

A

skewness & kurtosis, shapiro-wilk test, normal quantile plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

skewness

A

the spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

kurtosis

A

how peaked the data are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpreting skewness & kurtosis

A

if t skewness or t kurtosis > 3.2, violation of the respective assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

shapiro-wilk test

A

tests for normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

null hypothesis of shapiro-wilk test

A

the sample comes from a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

interpreting shapiro-wilk results

A

significant result = may not come from a normal distirbution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

normal quantile plot

A

sorts observations from smallest to largest, calculates z-scores of the sorted observations, and plots the observations against corresponding z-scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

intepreting normal quantile plot

A

if close to normal, the points will lie close to some straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

dealing with non-normality

A

data transformation or resampling methods (ex., bootstrap, jackknife)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

bootstrap

A

uses resampling with replacements to emulate the process of obtaining new samples so that we can estimate the variability of a parameter estimate without generating additional samples

19
Q

what happens if homoscedasticity is violated?

A
  • the variances of regression coefficient estimates tend to be under-estimated
  • thus, t-ratios tend to be inflated
20
Q

testing homoscedasticity

A

residual plots

21
Q

residuals

A

differences between Yi & Ŷi

22
Q

interpreting residual plots for homoscedasticity

A

funnel shape = violation of homoscedasticity

23
Q

dealing with heteroscedasticity

A

data transformation, other estimation methods, other regression methods

24
Q

testing linearity

A

residual plots

25
Q

interpreting residual plots for linearity

A

curve shape = violation of linearity

26
Q

dealing with non-linearity

A

data transformation, add another IV to the equation (non-linear function of one of the other IVs), use non-linear methods

27
Q

testing independence

A

Durbin-Watson (d) of autocorrelation

28
Q

Durbin-Watson test

A

tests the correlation between error terms ordered in time or space

29
Q

interpreting Durbin-Watson test results

A

1.5-2.5 = normal
below 1 or above 3 = abnormal

30
Q

dealing with dependence

A

data transformation, use other estimate methods, use other regression methods

31
Q

outlier

A

a data point disconnected from the rest of the datas

32
Q

checking outliers

A

cook’s distance

33
Q

interpreting Cook’s distnace

A

Cook’s D > 4 suggests potentially serious outliers

34
Q

dealing with outliers

A

if an unusual case is not likely to reoccur, delete the case or use robust regression

35
Q

consequences of multicollinearity

A
  1. unstable regression coefficient estimates (lower t-ratios)
  2. a high r2 (or significant F) but few significant t-ratios
  3. unexpected signs of regression coefficients
  4. the matrix inversion problem
36
Q

checking multicollinearity

A

tolerance, VIF, condition index

37
Q

tolerance

A

R2 for the regression of each IV on the other IVs, ignoring the DV

38
Q

interpreting tolerance

A

values < 0.1 = multicollinearity problem

39
Q

variance inflation factor (VIF)

A

1/tolerance

40
Q

interpreting VIF

A

values > 10 = multicollinearity problem

41
Q

condition index

A

measure of dependency of one variable on the others

42
Q

interpreting condition index

A

values > 30 = multicollinearity

43
Q

dealing with multicollinearity

A

drop a variable, incorporate more info (composite variable), or use other regression methods

44
Q

r2 vs. R2 in linear regression

A
  • Simple linear regression: r2 = R2
  • Multiple linear regression: r2 ≠ R2