Lecture 9 Flashcards
What is the Regression equation used to for?
Used to describe the linear relationship between X and Y
What is the one assumption of Regression?
- X and Y are linearly related.
The assumptions of Regression are NOT the same as ANOVA.
Give the equation for Y’ (Yprime - i.e. Y predict)
Y’ = a + bx
Which of X or Y is the dependent variable? Which is the independent variable?
Y is always the dependent variable.
X is always the independent variable.
What if you have two dependent variables?
The score that you are trying to predict goes on the Y axis.
Explain the “least squared line”
The method for determining a and b makes Σe² (i.e. the deviations) as small as possible. Therefore often called the “least squared line”.
Give the equation for the regression line, and explain each component part.
Y' = a + bx where: Y' = Ypredict a = intercept (i.e. the value of Y when X = zero - a = Ybar - bXbar b = Regression coefficient - the slope of the least squae line. b = σxy/σ²x = Covariance of x and y/variance of x --> = Σ[(x -Xbar)(y - Ybar)]/n-1 / Σ(x - Xbar)²/n-1 (the n-1's cancel each other out) so we will end up with the slope of the line.
Give more detail for b (the slope).
The numerator is telling us how much X and Y co vary, or the degree of statistical association.
The denominator adjusts the covariance (i.e. the rate of change information) so that it corresponds to a unit increase in X.
i.e. the slope indicates how many units Y increases for every one unit increase in X.
How are ANOVA and Regression similar?
They are mathematically identical. Compare and contrast in notes.
How do we calculate SS for a regression?
Conceptually, as in ANOVA, we sum the squared deviations.
- using computational formulas which don’t actually involve calculating the deviations. Calculating SS in regression is easier to track, if we exploit our knowledge of the Pearson r.
How do we find SSreg?
r²xy = variability in y attributable to x/total variability in y r²xy = Σ(Y' - Ybar)²/Σ(Y-Ybar)² r²xy = SSreg/SSy
with algebra, we get:
SSreg = (r²xy)(SSY)
Explain SSresidual i.e. SSres
If r²xy = variability in y attributable to x, then 1 - r²xy must = variability in y NOT attributable to x. i.e. residual variability
Therefore SSres = (1 - r²xy)(SSy)
This is the computational table.
Explain the F table as it relates to Regression.
Once we calculate the degrees of freedom, we can use the F ratio to determine if the statistical association is due to chance.
dfreg = 1
dfres = n - 2
Describe the equation for the F ratio.
F = (SSreg/dfreg)/(SSres/dfres) = (r²xy)(SSy)/dfreg/(1 - r²xy)(SSy)/dfres –> SSy in numerator and denominator cancel each other out.
important:
F = r²xy/dfreg/(1 - r²xy)/dfres
Discuss “Test Significance of Slope”
- Could be that the slope (b) we estimate from our data is just due to chance.
- The significance test for slope is the F ratio, or the F test.
Give the hypotheses for testing the significance of the slope (b)
H0: population b = 0 (i.e. slope of regression line = 0)
H1: population b =/= 0 (i.e. slope of regression line =/= 0)
Describe testing the significance of r (correlation). Including hypotheses.
- Testing the significance of r (the correlation) is equivalent to testing the significance of b. Therefore do one or the other, not both.
- H0: population r = 0
- H1: population r =/= 0
Compare the obtained r (correlation) to rcrit. - rcrit is a tabled value (able B.6, yellow handout).
- to enter table, need to know degrees of freedom for r (dfr) = n - 2 (where n = pairs of scores, i.e. subjects)
- If robtained is greater than rcrit, reject H0
What is AdjR²? (Adjusted r²)
- Sample r or R fluctuates from sample to sample
- And we square r or R to get r² or R²
and R² can only be positive
therefore all fluctuations of R² are in a positive direction. Therefore, R² is biased. - it tends to overestimate R
so we make an adjustment.