Regression Flashcards
What information does b tell us about the nature of the relationship between X & Y?
It tells us about the slope of the regression line
If given raw scores, which formula would we use to find b?;
What formula would we use if given r & standard deviations (Sx & Sy)?
b = SPxy / SSx; b = r (Sy / Sx)
How do we find a?
Get the mean of Y & subtract b multiplied by the mean of X (a = Y bar - bX bar)
What does the regression equation become when it is standardised?
Z hat y = Beta (standardised correlation coefficient) times Zx; or Z hat y = r xy times Zx
How do we find a z-score?;
If the mean of z-scores = 0 & SD = 1, what will the value of b become when standardising?;
What would a become?
Subtract the mean from X & divide by SD;
r (as the SD’s cancel each other out);
Zero (there is no intercept)
If a person is at the mean on X (Zx=0), where would we predict their Y to fall?;
If there’s no correlation between X & Y (r=0), where would we predict a person’s Y to fall?;
What if there’s a perfect correlation between X & Y (r=1)?
At the mean (Z hat y=0);
At the mean (Z hat y=0), regardless of score on X;
We’d predict a person is the same number of SDs from the mean on Y as they are on X (Z hat y = Zx)
If Y hat = 8, SD = 2 & r = .50, what is the predicted score for someone who is 2 SDs above the mean?
.50 times 2 = 1; 1 x 2 SDs = 2 + 8 = 10
If X is not known or r = 0, what’s our best prediction of Y?;
If we use this as a prediction, what’s the average amount of error associated with this prediction?;
How do we find this?
Mean of Y;
Sy (the SD of y - maximum amount of error possible);
Sy = square root of SSy / df (N-1)
If X is known & r doesn’t = 0, what’s our best prediction of Y?;
What’s the average amount of error associated with this called?;
How do we find this if given raw scores?;
What other method do we use?
Y hat;
Standard error of the estimate (Sy.x);
Sy.x = square root of SS error / df (remaining variability after using X to predict Y); df = N-2;
Sy.x = Sy times square root of 1 - r squared (slightly underestimates the amount of error)
How do we partition variance in regression?;
How can SS predicted be interpreted?;
SS residual?
SS y (total variability in y) = SS y hat (predicted variability) + SS error (what we can’t account for);
bX(i) + a;
e(i)
What is the standard error of the estimate?
Gives us an idea of the variability of the real scores around the regression line (aka standard deviation of errors of prediction or SD of the residuals)
Explain “regression towards the mean”;
If r = 1, then…;
If r < 1, then…;
the weaker the r the more the mean becomes what?
It’s a phenomenon of related measurements;
Z hat y = Zx: no regression towards the mean;
Z hat y < Zx: regression towards mean: we expect the 2nd measurement to be closer to the mean than the 1st;
The best predictor of Y
What does not influence the magnitude of the correlation coefficient?;
What does influence it?
Measurement scales of the variables;
Sample size, restriction of range & extreme scores or outliers
If I know that the relationship between two variables is significantly different for men & women but I run a regression across genders anyway, why is my r value inaccurate?
Presence of heterogeneous subsamples
How are residual scores represented?
e(i) = Y(i) - Y hat(i)