Topic 3: Regression Diagnostics Flashcards
linear regression assumptions
- linearity
- normality
- homoscedasticity
- independence
- outliers
- multicollinearity
linearity
the relationship between x and y is linear
normality
the error term follows a normal distribution
homoscedasticity
the error term has a mean 0 & a constant variance
independence
the error terms are not related to each otehr
outliers
there are no outliers
multicollinearity
there are no high correlations among IVs
testing normaltiy
skewness & kurtosis, shapiro-wilk test, normal quantile plot
skewness
the spread of the data
kurtosis
how peaked the data are
interpreting skewness & kurtosis
if t skewness or t kurtosis > 3.2, violation of the respective assumption
shapiro-wilk test
tests for normality
null hypothesis of shapiro-wilk test
the sample comes from a normal distribution
interpreting shapiro-wilk results
significant result = may not come from a normal distirbution
normal quantile plot
sorts observations from smallest to largest, calculates z-scores of the sorted observations, and plots the observations against corresponding z-scores
intepreting normal quantile plot
if close to normal, the points will lie close to some straight line
dealing with non-normality
data transformation or resampling methods (ex., bootstrap, jackknife)