Weeks 4 & 5 - Regression Flashcards
Which variable is also known as the predictor variable
independent variable (X)
Which variable is also known as the outcome variable
dependent variable (Y)
What kind of relationship is there between variables in a regression analysis
An asymmetrical relationship - scores on one variable (IV) predict scores on the other (DV)
Formula for a straight line function
y = mx + b, where y is the DV, x is the IV, m is the constant slope of the straight line, b is a constant for the y intercept (the value of y when x =0)
Why use a straight line?
Because it provides a strictly linear relationship between variables that are not likely to be present in a normal scatter plot (assuming that there is not a perfect correlation between variables)
Does it matter which variable is on the X and Y axes?
Yes, because one variable is predicting the other, the predictor variable (IV) should be on the X axis and the outcome variable (DV) should be on the Y axis.
What does the regression line mean?
It measures the summary characteristics of the relationship between two variables.
What method is used to find the regression line of best fit?
The least squares regression line ensures that there is the smallest deviation between the observed and predicted scores (obs & regression line)
What is minimised in the least squares regression line?
The sum of squared residuals (must be squared because the sum of residuals is equal to zero). The line of best fit has the smallest SSres.
What is the method of least squares?
The method of obtaining the line of best fit in a regression model that has the smallest possible SSres.
What is the least squares estimator?
The estimator used to obtain the line of best fit in a regression model. This estimator finds the estimated value for the slope and y intercept constant that minimises the SSres for the set of observed scores on the X and Y axis
What is the least squares linear regression function?
The line of best fit produced by the method of least squares
The full regression equation (full simple regression equation0
Y = a +bX = e (where a = constant intercept parameter; b = slope/regression coefficient; X = score on IV; e = residual score)
Regression model equation (simple regression model equation)
Y_hat = a +bX (excudes the residual score from right hand side of equation and uses the predicted Y score (Y_hat) on the right side of the equation)
What is the regression coefficient?
The slope parameter b in the regression model & full regression equation
What is a negative residual?
A residual score obtained when the predicted score is greater than the observed score
What is a positive residual?
A residual score where the observed score is greater than the predicted score
What is SSTotal?
The total variation in observed scores on the dependent variable (Y).
Measures the sum of squared differences between the observed Y scores and the mean (average).
What is SSReg?
Variation in the predicted scores in Y (DV).
Sum of squared deviations between the predicted scores and the mean.
Represents the variation in predicted scores accounted for by the model. Bigger SSReg means that the regression model is a good predictor.
What is SSres?
Variation in the difference between observed and predicted scores.
Represents the sum of squared deviations between the observed and predicted data (the residuals). Large SSres means regression model is not a good predictor. A SSres of zero means a perfect correlation, that observed and predicted scores fit perfectly along a straight line (but not v. likely to happen!)
What is R squared (R2)?
The proportion of total variation in Y accounted for by the model.
Measures the overall strength of prediction.
An R squared value can range between zero and +1 (can’t be negative because it’s a squared value).
An R squared value of zero means that the IV does not predict the DV (they are independent of each other)
An R squared value of 1 means that 100% of the variability in Y can be predicted by X (also v. unlikely), the larger the R squared value the greater the strength of prediction in the regression model.
How is R squared calculated?
R squared is calculated by dividing the SSReg by the SSTotal.
What are the alternative ways of calculating R squared?
By subtracting the SSres from the SSTotal (which produces the SSReg) and dividing this by SSTotal
By dividing the SSReg by SSReg + SSres (which equals the SSTotal)
These methods all produce the same measure of strength of prediction, but use the three measures of variability found in the regression model differently to obtain the same results.
Provide a good way of showing the relationship between these measures of variability.
What is R
The multiple correlation coefficient (Multiple R) = square root of R squared.