Lecture 3_Regression Intro Flashcards
The Regression Line (Definition, Calculation, and Interpretation), Explaining Variance using regression, and significance testing
What is the Centroid?
Point defined by plotting the means of 2 variables
• represents the center of the cloud of points
A regression line must …
- pass through the centroid
* come as close as possible to all points (therefore, the sum of the squared residuals will be as small as possible).
What does the slope (b) of a bivariate regression line tell us?
the amount Y is expected to change when X changes by 1 unit.
What does the intercept (a) of a bivariate regression line tell us?
the predicted value for Y when X = 0.
What does the Method of Least Squares guarantee?
the line will come as close as possible to all the data points (the e values will be as small as possible).
What is the calculation process for the Method of Least Squares?
1st - calculate the slope (b = ∑cross-products (SCP) / SSx)
2nd - calculate intercept (a = Y̅ - bX̅)
Why doesn’t the calculation of the slope in Method of Least Squares use SSy in the denominator?
because it is using scores on X to predict scores on Y
∑e = ? (hint: Method of Least Squares)
∑e = 0 (always!)
Because of the Method of Least Squares, e^2 is always …
the smallest possible value
Standard Error
SE = (e²/ N)^(1/2)
• the amount on average that predicted Y values differ from observed Y scores (similar to V and SD calculation)
What are 3 ways to think about variance?
• the average squared deviation of the
subjects’ scores from the mean of their scores.
• a measure (in squared units) of how
much the subjects differ amongst themselves.
• of the variable X, is the expectation of its square minus the square of its expectation.
How does sample size (N) effect the estimation of population variance?
A larger N improved accuracy (the difference between 4 and 5 is large compared to the difference between 99 and 100)
Is R² effected by sample size?
No.
R² = SS(regression) / SS(total)
What is the Mean Square (MS)?
A variance estimate
MS = SS/df
How do we interpret R²?
the proportion of variation in Y explained by variation in X.
How do we test the Significance of the proportion of Variation explained (R²)?
With an F test
F = [SS(regression)/ df1 ] / [SS(residual)/ df2]
How do we test the Significance of a regression coefficient (b)?
with a t Test
t = b / SE(b)
Confidence Interval (CI)
CI = b ± [t(critical) × SE(b)]
• an estimated range of values with a given high probability of covering the true population value