Topic 3: Regression Diagnostics Flashcards

Question 1

Q

linear regression assumptions

Answer

A

linearity
normality
homoscedasticity
independence
outliers
multicollinearity

Question 2

Q

linearity

Answer

A

the relationship between x and y is linear

Question 3

Q

normality

Answer

A

the error term follows a normal distribution

Question 4

Q

homoscedasticity

Answer

A

the error term has a mean 0 & a constant variance

Question 5

Q

independence

Answer

A

the error terms are not related to each otehr

Question 6

Q

outliers

Answer

A

there are no outliers

Question 7

Q

multicollinearity

Answer

A

there are no high correlations among IVs

Question 8

Q

testing normaltiy

Answer

A

skewness & kurtosis, shapiro-wilk test, normal quantile plot

Question 9

Q

skewness

Answer

A

the spread of the data

Question 10

Q

kurtosis

Answer

A

how peaked the data are

Question 11

Q

interpreting skewness & kurtosis

Answer

A

if t skewness or t kurtosis > 3.2, violation of the respective assumption

Question 12

Q

shapiro-wilk test

Answer

A

tests for normality

Question 13

Q

null hypothesis of shapiro-wilk test

Answer

A

the sample comes from a normal distribution

Question 14

Q

interpreting shapiro-wilk results

Answer

A

significant result = may not come from a normal distirbution

Question 15

Q

normal quantile plot

Answer

A

sorts observations from smallest to largest, calculates z-scores of the sorted observations, and plots the observations against corresponding z-scores

Question 16

Q

intepreting normal quantile plot

Answer

A

if close to normal, the points will lie close to some straight line

Question 17

Q

dealing with non-normality

Answer

A

data transformation or resampling methods (ex., bootstrap, jackknife)

Question 18

Q

bootstrap

Answer

A

uses resampling with replacements to emulate the process of obtaining new samples so that we can estimate the variability of a parameter estimate without generating additional samples

Question 19

Q

what happens if homoscedasticity is violated?

Answer

A

the variances of regression coefficient estimates tend to be under-estimated
thus, t-ratios tend to be inflated

Question 20

Q

testing homoscedasticity

Answer

A

residual plots

Question 21

Q

residuals

Answer

A

differences between Yi & Ŷi

Question 22

Q

interpreting residual plots for homoscedasticity

Answer

A

funnel shape = violation of homoscedasticity

Question 23

Q

dealing with heteroscedasticity

Answer

A

data transformation, other estimation methods, other regression methods

Question 24

Q

testing linearity

Answer

A

residual plots

Question 25

Q

interpreting residual plots for linearity

Answer

A

curve shape = violation of linearity

Question 26

Q

dealing with non-linearity

Answer

A

data transformation, add another IV to the equation (non-linear function of one of the other IVs), use non-linear methods

Question 27

Q

testing independence

Answer

A

Durbin-Watson (d) of autocorrelation

Question 28

Q

Durbin-Watson test

Answer

A

tests the correlation between error terms ordered in time or space

Question 29

Q

interpreting Durbin-Watson test results

Answer

A

1.5-2.5 = normal
below 1 or above 3 = abnormal

Question 30

Q

dealing with dependence

Answer

A

data transformation, use other estimate methods, use other regression methods

Question 31

Q

outlier

Answer

A

a data point disconnected from the rest of the datas

Question 32

Q

checking outliers

Answer

A

cook’s distance

Question 33

Q

interpreting Cook’s distnace

Answer

A

Cook’s D > 4 suggests potentially serious outliers

Question 34

Q

dealing with outliers

Answer

A

if an unusual case is not likely to reoccur, delete the case or use robust regression

Question 35

Q

consequences of multicollinearity

Answer

A

unstable regression coefficient estimates (lower t-ratios)
a high r2 (or significant F) but few significant t-ratios
unexpected signs of regression coefficients
the matrix inversion problem

Question 36

Q

checking multicollinearity

Answer

A

tolerance, VIF, condition index

Question 37

Q

tolerance

Answer

A

R2 for the regression of each IV on the other IVs, ignoring the DV

Question 38

Q

interpreting tolerance

Answer

A

values < 0.1 = multicollinearity problem

Question 39

Q

variance inflation factor (VIF)

Answer

A

1/tolerance

Question 40

Q

interpreting VIF

Answer

A

values > 10 = multicollinearity problem

Question 41

Q

condition index

Answer

A

measure of dependency of one variable on the others

Question 42

Q

interpreting condition index

Answer

A

values > 30 = multicollinearity

Question 43

Q

dealing with multicollinearity

Answer

A

drop a variable, incorporate more info (composite variable), or use other regression methods

Question 44

Q

r2 vs. R2 in linear regression

Answer

A

Simple linear regression: r2 = R2
Multiple linear regression: r2 ≠ R2