Linear Regression Flashcards
Linear regression
analysis used to predict the value of a variable (the DV) based on the value of another variable (the IV)
Regression line/line of best fit
can be calculated as long as relationship between variables can be calculated
when r=1, the regression line will produce perfect predictions (unlikely); between 0-1, moderate errors
when r=0, won’t help at all in predicting
used to predict Yi given Xi
What does Pearson r tell us?
- how helpful the regression line will be in predicting Yi given Xi
- the extent to which differences in Y can be explained (mathematically) by differences in X
- how much of the variability in Y is accounted for by (the variability in) X –> not causal or average values
r2
proportion of the variability of Y accounted for by X
What kind of relationships is Pearson r used to describe?
linear relationships, when X and Y are both measured on interval or ratio scales
Least-squares regression line
prediction line that minimizes the total error of prediction, according to the least-squares criterion of ∑(Y−Y’)2
for any linear relationship, there is only one line that will minimize ∑(Y−Y’)2
How do you calculate the least-squares regression line?
Y = bX + a
(Y’ = byX + ay)
Y’ = predicted or estimated value of Y
by = slope of the line for minimizing errors in predicting Y
ay = Y axis intercept for minimizing errors in predicting Y
ay = ȳ - byx̄
ȳ = ∑Y / N
*little y means subscript, big Y means variable
What are the limitations of linear regression?
only appropriate:
- for linear relationships
- when the sample is representative
- within the range of the original variables
What is the standard error of estimate for?
to tell us how much error we can expect when we use the regression line (prediction error)
value can be interpreted as an estimate of how may units the prediction will be off, on average
Homoscedasticity
homogeneity of variances
a condition in which the variance of the error is constant
ie, variability in Y stays constant across X values
Multiple regression
analyze the relationship between a single DV and multiple IVs