Weeks 4 & 5 - Regression Flashcards
Which variable is also known as the predictor variable
independent variable (X)
Which variable is also known as the outcome variable
dependent variable (Y)
What kind of relationship is there between variables in a regression analysis
An asymmetrical relationship - scores on one variable (IV) predict scores on the other (DV)
Formula for a straight line function
y = mx + b, where y is the DV, x is the IV, m is the constant slope of the straight line, b is a constant for the y intercept (the value of y when x =0)
Why use a straight line?
Because it provides a strictly linear relationship between variables that are not likely to be present in a normal scatter plot (assuming that there is not a perfect correlation between variables)
Does it matter which variable is on the X and Y axes?
Yes, because one variable is predicting the other, the predictor variable (IV) should be on the X axis and the outcome variable (DV) should be on the Y axis.
What does the regression line mean?
It measures the summary characteristics of the relationship between two variables.
What method is used to find the regression line of best fit?
The least squares regression line ensures that there is the smallest deviation between the observed and predicted scores (obs & regression line)
What is minimised in the least squares regression line?
The sum of squared residuals (must be squared because the sum of residuals is equal to zero). The line of best fit has the smallest SSres.
What is the method of least squares?
The method of obtaining the line of best fit in a regression model that has the smallest possible SSres.
What is the least squares estimator?
The estimator used to obtain the line of best fit in a regression model. This estimator finds the estimated value for the slope and y intercept constant that minimises the SSres for the set of observed scores on the X and Y axis
What is the least squares linear regression function?
The line of best fit produced by the method of least squares
The full regression equation (full simple regression equation0
Y = a +bX = e (where a = constant intercept parameter; b = slope/regression coefficient; X = score on IV; e = residual score)
Regression model equation (simple regression model equation)
Y_hat = a +bX (excudes the residual score from right hand side of equation and uses the predicted Y score (Y_hat) on the right side of the equation)
What is the regression coefficient?
The slope parameter b in the regression model & full regression equation
What is a negative residual?
A residual score obtained when the predicted score is greater than the observed score
What is a positive residual?
A residual score where the observed score is greater than the predicted score
What is SSTotal?
The total variation in observed scores on the dependent variable (Y).
Measures the sum of squared differences between the observed Y scores and the mean (average).
What is SSReg?
Variation in the predicted scores in Y (DV).
Sum of squared deviations between the predicted scores and the mean.
Represents the variation in predicted scores accounted for by the model. Bigger SSReg means that the regression model is a good predictor.
What is SSres?
Variation in the difference between observed and predicted scores.
Represents the sum of squared deviations between the observed and predicted data (the residuals). Large SSres means regression model is not a good predictor. A SSres of zero means a perfect correlation, that observed and predicted scores fit perfectly along a straight line (but not v. likely to happen!)
What is R squared (R2)?
The proportion of total variation in Y accounted for by the model.
Measures the overall strength of prediction.
An R squared value can range between zero and +1 (can’t be negative because it’s a squared value).
An R squared value of zero means that the IV does not predict the DV (they are independent of each other)
An R squared value of 1 means that 100% of the variability in Y can be predicted by X (also v. unlikely), the larger the R squared value the greater the strength of prediction in the regression model.
How is R squared calculated?
R squared is calculated by dividing the SSReg by the SSTotal.
What are the alternative ways of calculating R squared?
By subtracting the SSres from the SSTotal (which produces the SSReg) and dividing this by SSTotal
By dividing the SSReg by SSReg + SSres (which equals the SSTotal)
These methods all produce the same measure of strength of prediction, but use the three measures of variability found in the regression model differently to obtain the same results.
Provide a good way of showing the relationship between these measures of variability.
What is R
The multiple correlation coefficient (Multiple R) = square root of R squared.
What does Multiple R measure in a regression analysis?
The extent to which higher predicted scores (Y hat) for the DV are associated with higher observed scores (Y) on the DV.
What are the df reg?
The number of independent variables
What are the df res?
df res= n - no. of IVs - 1
What are the df total?
df total = df reg + df res
What theoretical probability distribution is equivalent to R squared as an estimator of of the overall strength of prediction in the regression model at a population level?
The F distribution
What techniques are used to make inferences about the overall strength of prediction in a regression model at a population level?
Null Hypothesis significance testing of R squared
What is the population parameter that corresponds to R squared?
P squared (rho squared)
What does the null hypothesis state for R squared at the population level?
Ho = P2 (rho suqared) = 0
What does the alternative hypothesis state when making inferences from R squared to P squared?
Ha = P2 (rho squared) is not equal to zero (or is larger than zero)
What do sums of squares (SS) measure?
SS measures variation (they are squared deviation scores)