Simple Regression Flashcards
Explain what the regression line can be used for
- Prediction
- Estimating the magnitude of effects of the predictor on the outcome.
Define the regression line
A straight line drawn through a scatterplot of two variables that comes as close to the data points as possible
Line of best fit
Method of least squares
Method used to find the regression line
What is the intercept in regression analysis?
Point at which the regression line cuts through the Y-axis or a in the regression equation
E.g. with no practice at all on a test, the score would be 2.45
Slope
Another name for the regression coefficient or b in the regression equation
The number of units that the regression line moves on the Y-axis for each unit it moves along the x-axis.
What is the linear regression equation and what can it be used for?
y = a + b * x
The value of y is equal to a (intercept) plus b (slope) multipliedW by the value of x for the given point
Use to predict how a case with a given score on x will score on Y
What do you compare the line of best fit with when assessing the significance of the effects of the predictor on the outcome?
- A regression line that is flat
- A line based on the mean value of the outcome
- A line indicating that the value of Y is always the same regardless of changes in the value of X
What is the value of the regression coefficient when the regression line is flat?
0
Implies that a line based on the mean sees the two variables as having no relationship.
Define what is meant by model sum of square (SSm)
The portion of total variance that the regression line accounts for
The difference between the total variance in Y scores and the variance in Y scores accounted for by the regression line.
Obtain by calculating the difference between the mean and each value of Y as predicted by the regression line, then square each difference and finally calculate the sum of all squared differences
When performing a simple regression, what does the F-value in the ANOVA table show?
The ratio between the portion of total variance accounted for by the regression line and the variance not accounted for by the regression line.
R-square
- Also known as coefficient of determination
- The proportion of variance in Y explained by X
- The variance explained by the regression line divided by the total variance in Y to be explained.
- Proportion of total variance in Y explained by the regression line/model (SSm), relative to how much variation there was to explain in the first place (SSt)
- Correlation coefficient squared
- SSM/SSt
Adjusted r square
Adjusted measure of R square accounting for possible overestimation.
Reduced value for R squared attempting to make an estimate of the value of R squared in the population.
Total sum of squares (SSt)
The line based on the mean of Y scores and its residuals
Calculate by calculating the difference between each actual value of Y and the corresponding predicted value of Y, then square each difference.
Explain the difference between SSR, SSM and SST
SSR (Sum or squared residuals) - Variance in Y that is not explained by the regression line. How well a linear model fits the data. Uses the differences between the observed data and the model.
SSM (Model sum of squares) - Variance in Y that is explained by the regression line. Uses the differences between the mean value of Y and the model.
SST (Total sum of squares) - Total variance in Y to be explained by the regression line, cannot account for. Represents the degree of inaccuracy when the best model is fitted to the data. Uses the differences between the observed data and the mean value of Y
When performing a simple regression, what does the coefficients table tell you?
Provides further information about the magnitude of the effects of X on Y
Beta =The standardised regression coefficient
B on constant = Value of the intercept
B on variable = Value of the slope (regression coefficient)
Beta
Refers to how much the value of the outcome increases or decreases as the value of the predictor increases of 1 standard deviation unit.
In a scatterplot describing the relationship between the two standardised variables. Beta is the slope of the regression line.
- Slope refers to the number of SDs that the regression line moves on the Y-axis for each SD it moves along the X-axis.
- Express in SD units to make it more comparable. You then know which one is impacted more strongly.
When is the value of beta the same as the value of r?
Value of beta is the same as the value of r in simple regression, but not in multiple regression.
Provide an example of the relationship between b, r, beat and r-square
The value of b depends on the steepness of the slope while the value of Beta depends on how closely clustered around the line the data points are.
Discuss causality in regard to simple regression
If you find that one variable has an effect on another, this does not mean that changes in one determine/are the cause of changes in the variable.
It simply allows us to guess the effect of a variable and whether they are related. It is not the determinant.
There is a difference between correlational studies and experiments. A correlational study is where you measure the variables of interest without any form of manipulation.
In order to assume causality you need to do an experiment. This is when the IV is manipulated to observe the effects on the DV. Then you can say that it has caused something.
Provide an example of when you can claim causation from correlational evidence
If you are confident that the found relationship between X and Y is not due to the fact that a third variable determines both X and Y.
Residual
The difference between the Y value of the actual case and the Y value that the case would take if lying on the line
- Difference between what the model predicts and the observed data
If you calculate each residual, then square it and add them all up you obtain the sum of squared residuals (SSr)
The line with the lowest SSR is the line of best fit.
Explain why you need to assess the goodness-of-fit of the regression line
The regression line is the best line available for fitting the data. This means that it allows you to make the best possible predictions about how a case with a given score on X will score on Y
But also need to assess how well the regression lines fits the actual data so that we know how accurately the values of X can predict values of Y.
How do you assess the goodness of fit of the regression line?
Look at how much more variability in the outcome variable the regression line is able to explain by comparison with the line based on the mean. Then divide this amount by the variance unexplained by the regression line.
This is a way of assessing how well the model fits the observed data
What is the difference between a simple and multiple regression?
Simple regression is when you have one predictor variable whereas in multiple regression you have several predictors.
When performing a simple regression, what is a b-value?
Tells us the gradient of the regression line and the strength of the relationship between a predictor and the outcome variable.