Lecture 10 Sabina Flashcards

1
Q

Why is the residual important?

A

Whatever was left after feeding the original predictor is the residual, and this can unexplained variance can be higher than what you DID explain. It has important information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you know that a regression line not linear?

A

By looking at the residuals in a scatterplot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How will you know if you’re testing the same population or different populations?

A

By looking at residuals, you MUST test assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the assumptions underlying MR?

A
  1. Dependent variable a linear function of the IVs
  2. Each observation drawn independently
  3. Homoscedasticity of variance
  4. Errors are normally distributed, with the mean = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you know if you have a DV that is a linear function of the IVs? (first assumption)

Why is it important?

A

Can plot the DV against the IV

  • Can fit quadratic (etc) term(s) in the regression
  • A more detailed examination using scatterplots
  • Plot residuals
  • Plot the residuals against the IV

Why?

  • e = Y’-Y (errors in prediction)
  • If there is a departure from linearity, it’d be more magnified with the plot of the residual terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you in SPSS find the non-paraetric best fitted line that is not linear?

A

Lowess plot function in SPSS - not a straight line. If there is no pattern left there should be a straight line. If there is a pattern, this line is important to you.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you Diagnose Violation of Linearity Assumption

A

Horizontal line is the mean of e’s
- It is flat because effect of IV of
interest (homework) was removed, and homework now has nothing to do with the residuals
- When two variables are unrelated, the best fitted line is just a mean of Y

› If the assumption of linearity is not violated, there is little or no departure from the regression line
› If the assumption of linearity is violated, Lowess fit line will look somewhat curvilinear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If there is no departure from linearity in Unstandardised Residuals Against Predicted Scores

A

it should be close to the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

problem with the Assumption of Linearity Underlying MR

A

As you just saw, if there is only slight departure from linearity, you can easily miss it when using the scatterplots
› So, it is beneficial to use all methods.
› If theory and data suggest non-linearity, build non-linear
term(s) in the regression equation and test for their statistical
significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens if data not drawn independently (e.g.,

possibility of clusters)

A

there is a risk of violating assumption that the residual
terms are independent
› Violation of this assumption affects SEb
- Underestimation of errors is dangerous for hypotheses testing
› This danger lessens with larger N and “sophisticated” sampling techniques
Box plots will be all different sized boxes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we need to watch out for with Homoscedasticity of variance?

A

Butterfly pattern or two large clusters.. violates assumption. variance figures are totally different.

Keith’s rule of thumb: if the ratio of
high to low variance < 10, not a problem
But, there are other tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

if residuals are normally distributed…

A

the scatter is close to the ‘ideal’ line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Most common errors in data

A
  1. problem with coding
  2. compared different samples accidentally
  3. need to eliminate some subgroups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Distance, leverage and influence

A

Some people the model will over predict, or under predict. You should look at the size of these residuals.

The outlier that has the largest residual needs to be identified by distance from the actual value.
Don’t remove anything just because of the error component (unethical), but if you did remove it, results would change. This person could have a learning disability (a variable not controlled for)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Leverage

A

Refers to a ‘suspicious’ pattern of values in IVs
Diagnostic technique:
Graph IVs against each other
- SPSS provides leverage statistics, so you can examine them
- Rule of thumb based on the statistic (k+1)/N, where k=number of IVs and
N=sample size, see Keith for more details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standardised residuals

A

After you look at the grades, the iVs and the predicted grades, you see standardised residual. This is important so we standardise the units to make it comparable. If one person is 2 and a half SD above and beyond on ability, you need a scholarship to cambridge.

If someone has 3 SD below average, go to original report - see if their data was miscoded, or if they are special in some way

17
Q

(Multi)Collinearity

A
  • Refers to the problem when IVs correlates highly with each other
  • It results in bizarre and misleading results

you can include colinearity in spss output. If you have a high corr, the regression will have a negative B, because of multicolinearity.
So add tolerance and VIF statitsics. As a researcher, we get scared if the tolerance is below .4.

To solve problem, combine V1 and V2 into a composite, easy to fix. Example: if HW in and out was corr .900, make it just homework.

Example: grade average as one IV, different subjects grades as other IVs.
Or hours worked effecting grades is one IV, and whether they worked or not is another IV.