Regression diagnostics Flashcards
linear regression assumttions
mean distribution of error zero, distribution of error constant variance, distribution of error normal, errrors independent
Mean of distribution of error is 0
The mean of the response is a linear function of x. if look graphically we expect that as x increases the average value of y increases or decreases and does so linearly (not quadratic, logarithmic, exponential – would suggest model not a good one)
Distribution of error has constant variance
Just like analysis of variance and t test assumption that variance is the same – the variance of the response variable is same regardless of the value of x. the spread of y shouldn’t change depending on the value of x – should basically be constant
Distribution of error is normal
Errors independent: for ever observation (subject/sample in study) the deviation from the regression line is independent from one subject to the next. (error overestimating one person’s weight is independent of error estimating another person’s weight)
when is errors independent assumption not met
When is the 4th assumption not met? When measuring the same stock price from one day to next there will be temporal correlation. If taking BP measurement on one day and same subject the following day those are not independent. Places where time comes in is most common place where assumption suspect.
how to evaluate if assumptions hold
estimate errors called residuals
residual formula
Observed value of y (11) – fitted line on regression line (10)
can be positive or negative
residual useful for
o Diagnostics–techniques for checking assumptions of the regression model
o Understanding the variation in Y that is unexplained by the regression model
o Identifying possible outliers
how to look at residuals
o Plot residuals vs. Xi Values
o Plot residuals vs predicted values ( )
o Plot histogram or stem-and-leaf of residuals
o –Q-Q plot of residuals
residual plot for functional form
bad if residuals form nonlinear pattern like an arch
residual plot for equal variance
bad if fan shaped
for small values of x variance in individuals is small and as x increases variance gets larger
q-q plot
used to assess whether errors follow normal distribution
A Q-Q plot graphs the quantiles of the residuals against the expected quantiles for a sample from a normal distribution
what should Q-Q plot look like
Ideally, a Q-Q plot will be a straight line. Deviations from linearity indicate how the distribution of errors differs from normality
skewed residuals mean
when you deviate far from regression line you will be far above regression line
what to do if plots indicate problem
o Add or remove variables
o Transform variables or recode categorical variables
o Remove outliers (but be careful!)
• Use a different analytic approach