Lecture 32- Checking Our Assumptions Flashcards
What is linear regression good for?
- Estimate the mean response for any given predictor value (x)
- The line/ model allows us to base our predictions based off the whole range of data rather than just one point
What are the four underlying assumptions of linear regression? (think of the acronym)
-Linearity: The relationship between the mean response µY and x is
described by a straight line.
-Independence: The responses Y1, Y2, . . . , Yn are statistically
independent.
-Normality: The error terms e1, e2, . . . , en come from a normal
distribution.
-Equal variance: The errors terms all have the same variance, σ
2 (‘homoscedastic’).
What is the fitted line plot?
Plot of regression line superimposed on scatterplot of data.
How are standardized residuals and raw residuals different? Which is preferred?
- A raw residual is the mathematical difference between an observed data point and a calculated predicted value for that point.
- A standardized residual takes that raw residual and divides it by the standard deviation of the total set of residuals.
- Usually standardized is preferred
Look at slides 608 and 609 which one shows where the linearity assumption holds and which one shows where the linearity assumption does not hold?
608= doesn't see a quadratic pattern emerging 609= does hold, notice in the graph showing the residuals against the fitted values there is roughly equal error variance
How do you check the independence assumption?
- This one so much isn’t about looking at a graph but rather thinking about the study design
- Cases where you wouldn’t have independence is if you measure multiple of the same person or have related individuals in the sample. Cluster sampling is also a thing to consider: are individuals in the same cluster influencing each other.
How do you check the normality assumption?
- The histogram of residuals should follow a normal bell shaped curve
- The Q-Q plot should appear as a straight line
How do you check the equal variance assumption ?
-Can examine equal variance assumption by looking at fitted line plot
or (better) plot of residuals against fitted values.
-If error terms e1, e2, . . . , en have equal variance, then magnitude of
spread of data about regression line should not change too much with
x.
-In contrast, if (say) variance of error terms increases with x, then we
would expect to see data more dispersed about regression line as x
increases.
Draw a quick sketch of what a graph of residuals would look like is the equal variance assumption fails…
See slide 615
What happens if the linearity assumption fails? How might you overcome this?
- All conclusions drawn from a linear model will be false. This is the most critical part of LINE
- Can be rectified by transforming the response variable (log) or including polynomial terms
What happens if the independence or equal variance assumptions fail?
-Estimates of parameters remain valid even when independence or
equal variance assumptions fail.
-However, estimates can be inefficient; i.e. they can be improved.
-Follows that fitted regression line is useable.
-Any test results or confidence intervals based on the regression model
will be invalid.
-Failure of assumptions can be rectified by sophisticated modelling
techniques.
What happens if the normality assumption fails?
- Failure of normality assumption is typically least important.
- Effects validity of confidence intervals and test results when the sample size n is small.
- In some cases failure of this assumption can be corrected by transformation of response variable.
What are outliers? What do you do about them?
-Sometimes a regression model will seem to fit the data well, apart
from one or two outliers.
-If outliers can not be corrected (i.e. if weren’t due to a measuring mistake) then try refitting the regression line with them removed
-Still investigate the cause of those outliers as they may be important