Linear and Multiple Regression Flashcards
What are regression’s used for?
To predict an outcome from a predictor variable.
What is the equation of a straight line?
y=mx+c
What does b0 represent?
The intercept
What does b1 represent?
The slope
What does the regression line aim to do?
Ensure the line of best fit produces the smallest amount of residuals.
What does the R^2 value tell us?
How good the model/regression is - the amount of variance explained by the model.
What does the F ratio tell us about a linear regressionn model?
If it is significant.
What does the F ratio tell us about a linear regression model?
If the regression predicts a significant amount of the variation.
How do you find the difference between the observed and predicted Y scores? (SS residual)
Predicted Y scores using the equation minus the actual Y scores, then square the values to stop them cancelling each other out.
∑(Y’− Y)^2
How do you find the total variance of Y scores in the data set? (SS Total)
Each data point minus the mean for all Y data points, added together.
∑(Y - M)2
What does the SS Residual tell us?
An estimate of the amount of variation that is NOT predicted by our regression
How do you work out the SS regression?
SS Total - SS Residual
What does the SS Regression tell us?
An estimate of the amount of variance explained by the regression of the model.
What affects the estimate of the SS regression?
Sample size and the amount of total variation in the sample.
How do you work out the R^2 value using SS values?
SS regression divided by the SS total
What is another representation of SS regression?
SS m
What is the F Ratio?
The ratio between two variances (predicted and error)
How to work out the F ratio?
SS regression is divided by the degrees of freedom.
What does the p-value mean?
It tells us whether the result is significant so we know whether to accept or reject the null hypothesis.
What assumptions have to be met for a simple regression? (6)
- Outcome must be continuous
- Predictors must not have zero variance
- All values of the outcome should come from a different person/item
- The relationship we model in reality must be linear
- Homoscedasticity
- Residuals must be normally distributed
What are the 2 types of multiple regression?
- Forced Entry Regression
- Hierarchical Regression
What is the adjusted R^2?
R^2 will be an overestimate of the real R^2 In the population so it is adjusted down to allow for the overestimation of R^2.
What are the unstandardised (b) and standardised (beta) coefficients used for?
Unstandardised - used within any equation.
Standardised - allows us to make comparisons across the predictors.
What is the difference between Forced Entry and Hierarchical regression?
In forced entry, all the predictors are entered into the analysis at once. In hierarchical regression, some variables are controlled for.
How do we compare models in hierarchical regression?
We check to see if more variance is explained in the second model compared to the first.
If the beta value is positive for a dummy variable, what does this mean?
The category coded as ‘1’ is higher (scores higher) than the category coded as ‘0’.
What is multicollinearity?
When predictors are highly correlated with each other.
What numbers cause alarm in the collinearity VIF column?
Anything close to 10. (Anything over 5 is quite problematic).
How much of the sample is allowed to be over 2SD away from the mean?
About 5%.
What should the Cook’s distance be?
All below 1.