lecture 3 - simple linear regression Flashcards
correlation - how to you calculate pearsons
- Calculate covariance between the X and Y variables, and then standardize
- Convert the X and Y scores to z-scores (standard scores), then divide by n
what is r (correlation
r is the change in SD units of Y that occurs for every 1 SD change in X
correlation vs regression
Correlation: is there a relationship between 2 variables?
Regression: how well does one variable predict the other variable?
what is required when predicting regression
Prediction requires calculating a line of best fit (an equation)
You can then use this equation to obtain a best-fit estimate for any new data point (X) within the range of the original data set.
variables
dv
iv
-simple linear vs multiple regression
Dependent variable (DV) or
criterion variable or
response variable
= the variable that you are
trying to predict
Independent variable (IV) or predictor variable or regressor
= the variable that you are trying to predict from
Simple linear regression = 1 predictor variable
Multiple regression = 1+ predictor variables
what is the regression equation / straight line
Y = a + bX
a = intercept
b = slope
error in regression
In psychological research, we never get such perfect relationships (because determinants of any psychological variable are very complex and because of error in instruments).
A more likely scenario: Y = a + bX
+ error
Goal: find a regression line that provides the best prediction possible
i.e., a regression line that minimizes error
what is the best regression line in terms of sum of squared deviations
The best regression line is the line that minimizes the sum of squared deviations
(i.e., a line that satisfies the “least squares” criterion)
best fit line / regression line steps
Deviations (i.e., residuals) = predicted value minus observed value.
Step1: for each data point, calculate the deviation, then square it.
Step2: across the dataset, add up all deviations (→ sum of squared deviations).
Best fit: the equation that produced the smallest SSERROR
smaller least squares = ….
Smaller Least Squares = less prediction error
= relatively better fit.
poor fitting lines generate ______ predictions
Remember: Poor-fitting lines generate poor predictions.
how to calculate r
step 1 : convert X and Y into z-scores
step2: multiply z(X) by z(Y)
step 3 : add up
step 4 : divide by n -1
What are the values of a (the intercept) and b (the slope) that satisfy the Least Squares criterion
explained variance
How “good” are these predictions? How good is the fit of the regression model?
How much variance does X explain in Y
how do we see how good the predictions are / the fit of the regression model
Calculate how much variance there is in Y in total: Sum of squares Y (SSTOTAL)
Calculate how much variance X can explain: Sum of squares X (SSREGRESSION)
Calculate how much variance is not explained: Sum of squares Residual (SSERROR)
SSTOTAL = SSREGRESSION + SSERROR
SSY) (SSX) (SSRESIDUAL)
Goal:
SSX as high as possible + SSERROR as low as possible
how to calculate regression using explained variance
Step1: calculate the difference
(deviation) between each score
and the mean
Step2: square the deviations
Step3: add up
use equation
what is the easier way to adress how much varince x ecplains
Calculate R2 (the coefficient of determination):
F statistic
After you compute a regression equation and R2, you still have to determine whether the model
accounts for a statistically significant amount of variance.
→ SPSS output:
you get an F statistic, which tells you whether the predictions of the model are
better than if you were to use only the sample mean to predict performance.
F =
what the regression can explain (i.e., SSREGRESSION ) /////
what the regression cannot explain (i.e., SSERROR )
reporting linear regression from SPPS output
“The number of years of education [i.e., variable X] significantly predicted memory
scores [i.e., variable Y], b=3.98, accounting for 57% of the variance in memory scores,
F(1, 15)= 19.21, p<.05. This supports/confirms the hypothesis that…”
If there is no relationship:
“There was no relationship between [variable X] and [ variable Y], b=xxx, p>.05, R2 = xxx,
suggesting that…