Lecture 3_Regression Intro Flashcards
The Regression Line (Definition, Calculation, and Interpretation), Explaining Variance using regression, and significance testing
What is the Centroid?
Point defined by plotting the means of 2 variables
• represents the center of the cloud of points
A regression line must …
- pass through the centroid
* come as close as possible to all points (therefore, the sum of the squared residuals will be as small as possible).
What does the slope (b) of a bivariate regression line tell us?
the amount Y is expected to change when X changes by 1 unit.
What does the intercept (a) of a bivariate regression line tell us?
the predicted value for Y when X = 0.
What does the Method of Least Squares guarantee?
the line will come as close as possible to all the data points (the e values will be as small as possible).
What is the calculation process for the Method of Least Squares?
1st - calculate the slope (b = ∑cross-products (SCP) / SSx)
2nd - calculate intercept (a = Y̅ - bX̅)
Why doesn’t the calculation of the slope in Method of Least Squares use SSy in the denominator?
because it is using scores on X to predict scores on Y
∑e = ? (hint: Method of Least Squares)
∑e = 0 (always!)
Because of the Method of Least Squares, e^2 is always …
the smallest possible value
Standard Error
SE = (e²/ N)^(1/2)
• the amount on average that predicted Y values differ from observed Y scores (similar to V and SD calculation)
What are 3 ways to think about variance?
• the average squared deviation of the
subjects’ scores from the mean of their scores.
• a measure (in squared units) of how
much the subjects differ amongst themselves.
• of the variable X, is the expectation of its square minus the square of its expectation.
How does sample size (N) effect the estimation of population variance?
A larger N improved accuracy (the difference between 4 and 5 is large compared to the difference between 99 and 100)
Is R² effected by sample size?
No.
R² = SS(regression) / SS(total)
What is the Mean Square (MS)?
A variance estimate
MS = SS/df
How do we interpret R²?
the proportion of variation in Y explained by variation in X.