Lecture 32- Checking Our Assumptions Flashcards

1
Q

What is linear regression good for?

A
  • Estimate the mean response for any given predictor value (x)
  • The line/ model allows us to base our predictions based off the whole range of data rather than just one point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four underlying assumptions of linear regression? (think of the acronym)

A

-Linearity: The relationship between the mean response µY and x is
described by a straight line.

-Independence: The responses Y1, Y2, . . . , Yn are statistically
independent.

-Normality: The error terms e1, e2, . . . , en come from a normal
distribution.

-Equal variance: The errors terms all have the same variance, σ
2 (‘homoscedastic’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the fitted line plot?

A

Plot of regression line superimposed on scatterplot of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are standardized residuals and raw residuals different? Which is preferred?

A
  • A raw residual is the mathematical difference between an observed data point and a calculated predicted value for that point.
  • A standardized residual takes that raw residual and divides it by the standard deviation of the total set of residuals.
  • Usually standardized is preferred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Look at slides 608 and 609 which one shows where the linearity assumption holds and which one shows where the linearity assumption does not hold?

A
608= doesn't see a quadratic pattern emerging
609= does hold, notice in the graph showing the residuals against the fitted values there is roughly equal error variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you check the independence assumption?

A
  • This one so much isn’t about looking at a graph but rather thinking about the study design
  • Cases where you wouldn’t have independence is if you measure multiple of the same person or have related individuals in the sample. Cluster sampling is also a thing to consider: are individuals in the same cluster influencing each other.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you check the normality assumption?

A
  • The histogram of residuals should follow a normal bell shaped curve
  • The Q-Q plot should appear as a straight line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you check the equal variance assumption ?

A

-Can examine equal variance assumption by looking at fitted line plot
or (better) plot of residuals against fitted values.

-If error terms e1, e2, . . . , en have equal variance, then magnitude of
spread of data about regression line should not change too much with
x.

-In contrast, if (say) variance of error terms increases with x, then we
would expect to see data more dispersed about regression line as x
increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Draw a quick sketch of what a graph of residuals would look like is the equal variance assumption fails…

A

See slide 615

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens if the linearity assumption fails? How might you overcome this?

A
  • All conclusions drawn from a linear model will be false. This is the most critical part of LINE
  • Can be rectified by transforming the response variable (log) or including polynomial terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens if the independence or equal variance assumptions fail?

A

-Estimates of parameters remain valid even when independence or
equal variance assumptions fail.
-However, estimates can be inefficient; i.e. they can be improved.
-Follows that fitted regression line is useable.
-Any test results or confidence intervals based on the regression model
will be invalid.
-Failure of assumptions can be rectified by sophisticated modelling
techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens if the normality assumption fails?

A
  • Failure of normality assumption is typically least important.
  • Effects validity of confidence intervals and test results when the sample size n is small.
  • In some cases failure of this assumption can be corrected by transformation of response variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are outliers? What do you do about them?

A

-Sometimes a regression model will seem to fit the data well, apart
from one or two outliers.
-If outliers can not be corrected (i.e. if weren’t due to a measuring mistake) then try refitting the regression line with them removed
-Still investigate the cause of those outliers as they may be important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly