Lecture 10 Sabina Flashcards

Question 1

Q

Why is the residual important?

Answer

A

Whatever was left after feeding the original predictor is the residual, and this can unexplained variance can be higher than what you DID explain. It has important information.

Question 2

Q

How do you know that a regression line not linear?

Answer

A

By looking at the residuals in a scatterplot.

Question 3

Q

How will you know if you’re testing the same population or different populations?

Answer

A

By looking at residuals, you MUST test assumptions

Question 4

Q

What are the assumptions underlying MR?

Answer

A

Dependent variable a linear function of the IVs
Each observation drawn independently
Homoscedasticity of variance
Errors are normally distributed, with the mean = 0

Question 5

Q

How do you know if you have a DV that is a linear function of the IVs? (first assumption)

Why is it important?

Answer

A

Can plot the DV against the IV

Can fit quadratic (etc) term(s) in the regression
A more detailed examination using scatterplots
Plot residuals
Plot the residuals against the IV

Why?

e = Y’-Y (errors in prediction)
If there is a departure from linearity, it’d be more magnified with the plot of the residual terms

Question 6

Q

How do you in SPSS find the non-paraetric best fitted line that is not linear?

Answer

A

Lowess plot function in SPSS - not a straight line. If there is no pattern left there should be a straight line. If there is a pattern, this line is important to you.

Question 7

Q

How do you Diagnose Violation of Linearity Assumption

Answer

A

Horizontal line is the mean of e’s
- It is flat because effect of IV of
interest (homework) was removed, and homework now has nothing to do with the residuals
- When two variables are unrelated, the best fitted line is just a mean of Y

› If the assumption of linearity is not violated, there is little or no departure from the regression line
› If the assumption of linearity is violated, Lowess fit line will look somewhat curvilinear

Question 8

Q

If there is no departure from linearity in Unstandardised Residuals Against Predicted Scores

Answer

A

it should be close to the regression line

Question 9

Q

problem with the Assumption of Linearity Underlying MR

Answer

A

As you just saw, if there is only slight departure from linearity, you can easily miss it when using the scatterplots
› So, it is beneficial to use all methods.
› If theory and data suggest non-linearity, build non-linear
term(s) in the regression equation and test for their statistical
significance

Question 10

Q

What happens if data not drawn independently (e.g.,

possibility of clusters)

Answer

A

there is a risk of violating assumption that the residual
terms are independent
› Violation of this assumption affects SEb
- Underestimation of errors is dangerous for hypotheses testing
› This danger lessens with larger N and “sophisticated” sampling techniques
Box plots will be all different sized boxes.

Question 11

Q

What do we need to watch out for with Homoscedasticity of variance?

Answer

A

Butterfly pattern or two large clusters.. violates assumption. variance figures are totally different.

Keith’s rule of thumb: if the ratio of
high to low variance < 10, not a problem
But, there are other tests

Question 12

Q

if residuals are normally distributed…

Answer

A

the scatter is close to the ‘ideal’ line.

Question 13

Q

Most common errors in data

Answer

A

problem with coding
compared different samples accidentally
need to eliminate some subgroups

Question 14

Q

Distance, leverage and influence

Answer

A

Some people the model will over predict, or under predict. You should look at the size of these residuals.

The outlier that has the largest residual needs to be identified by distance from the actual value.
Don’t remove anything just because of the error component (unethical), but if you did remove it, results would change. This person could have a learning disability (a variable not controlled for)

Question 15

Q

Leverage

Answer

A

Refers to a ‘suspicious’ pattern of values in IVs
Diagnostic technique:
Graph IVs against each other
- SPSS provides leverage statistics, so you can examine them
- Rule of thumb based on the statistic (k+1)/N, where k=number of IVs and
N=sample size, see Keith for more details

Question 16

Q

Standardised residuals

Answer

Study These Flashcards

A

After you look at the grades, the iVs and the predicted grades, you see standardised residual. This is important so we standardise the units to make it comparable. If one person is 2 and a half SD above and beyond on ability, you need a scholarship to cambridge.

If someone has 3 SD below average, go to original report - see if their data was miscoded, or if they are special in some way

Question 17

Q

(Multi)Collinearity

Answer

Study These Flashcards

A

Refers to the problem when IVs correlates highly with each other
It results in bizarre and misleading results

you can include colinearity in spss output. If you have a high corr, the regression will have a negative B, because of multicolinearity.
So add tolerance and VIF statitsics. As a researcher, we get scared if the tolerance is below .4.

To solve problem, combine V1 and V2 into a composite, easy to fix. Example: if HW in and out was corr .900, make it just homework.

Example: grade average as one IV, different subjects grades as other IVs.
Or hours worked effecting grades is one IV, and whether they worked or not is another IV.

Lecture 10 Sabina Flashcards

(17 cards)