L6 - Linear Regression Flashcards
What is the purpose of a simple linear regression?
Simple linear regression involves fitting a line of best fit to a scatterplot of figures representing two linearly related variables (X) and (Y).
What are the 6 features of a simple linear regression?
- Line of best fit
- Method of least squares
- Variance partitioning and residuals
- Coefficient of determination (R-squared)
- Regression equations/interpretation
- Coefficients and beta values
How do we develop a ‘line of best fit’
Find the y axis and then the intercept
Y = MX + C
What is the difference between a multiple regression and a linear regression?
Multiple regression has multiple predictors
Linear regression has a single predictor
What is the total variance in a regression?
How far your actual score is from the y bar (estimated value)
Total variance is calculated as Y - Ybar (Y = actual score, Ybar is original estimate)
- Y actual = top*
- Y1 (regression line) = middle*
- Y original prediction (i.e. class average) = Ybar*
- Y1 is an improvement on the original predictor*
- This is explained variance (i.e. improvement)- this is the “regression component”, the bit you have explained*
What is the difference between Y and Y1 in this example called?
The Residual (Error) of the model
How far off the model is from the actual number
If you take Y actual from Y1 (predicted) of every person, square them and then add them up, what do you get?
The Error Sum of Squares
Amount of variance we have not been able to explain with the variables
“Error variance”
What is the total sum of squares?
Tot SS = difference between actual scores and the mean value of Y.
For every Y person in the study, we square their results and then add the numbers up (DOUBLE CHECK NOT SURE)
What is the Regression Sum of Squares (RSS)?
The Explained Variance.
The difference between the original average estimate and the closer estimate after a regression has been done
Calculation: sum of (predicted value - roughest estimate)
What is the calculation of R2 (R-squared)?
RSS/TSS
(TSS = total sum of squares)
(RSS = regression sum of squares)
What is the multiple coefficient of determination (R2)?
Coefficient of determination (R2) is the proportion of total variance explained by the model
- When it is high, the line is really close to the actual points. The variables are capturing the variance (the rough estimate of the mean). It’s doing really well at explaining the variance. When it is low, you are not.*
- Tells us how correlated the Y variable is related to the X variable.*
How do you know if (R2) is meaningful (significant)? (e.g. is 20% of the variance meaningful?)
F test
F Ratio of systematic explained variance to error variance
Between sum of squares and within sum of squares
If between is bigger, then you have a difference between your groups that is bigger than the difference within your groups and then it is significant
What does a significant F ratio mean?
Your model is significant in explaining a meaningful amount of variance in the results (Y)
When is a small F ratio still likely to be significant?
When the population size is smaller
How do you interpret a regression equation?
Example: Y= 45.67 + 0.67Age
Always has an exam question on this
Example: Y= 45.67 + 0.67Age,
means that 1 unit changes in age lead to 1 unit (.67 unit) increases in Y.
Each increase in age, is associated with a .67 increase in Y
What does the coefficient (.67) mean?
Each unit increase in age is associated with a .67 increase in Y
What is a Beta Value?
Standardised values that can be compared.
What happens when you have a strong beta value?
The stronger they are, the stronger the relative importance of each predictor
How do you calculate a Beta Value (standardise the coefficients)
Get SD of dependent measure and divide by SD of particular predictor and multiple that by each coefficient.
What is the main issue with a regression coefficient?
It doesn’t tell you which variable is most important, there’s no effect size
The size of the coefficient will differ depending on the measure
Comparing regression coefficients tells you nothing about the imporatance of the variable.
How do you understand its importance?
You have to do is standardise the coefficients
Get SD of dependent measure and divide by SD of particular predictor and multiply that by each coefficient.
This is a standardised beta.
What are residuals in regression?
The variance you haven’t explained
The actual score minus the predicted score
The name for (R2) is…
Coefficient of determination
What are the two “principles” of multiple regression?
Explanation vs prediction
Prediction: Doesn’t care about theory, just what group you belong to
(e.g. gathering data online to predict behaviour, google)
More practical, which people are most likely to be e.g. problem gamblers
Explanation: Explaining what variable is the best predictor
E.g. Bronfenbrenner model. What is the variables that are most likely to impact child behaviour
Which of the levels of influence is MOST influential when I test them against each other, how much variance is attributed to each predictor variable
What is Multicollinearity?
Means that many of your variables are correlated
It means when you explain variance, x1 x2 x3 are all related and so each individually correlates with Y but when you put them together they all eat up each others variance since they are related.
What is the function for a simple linear regression?
Y=mx + c
Where m is the slope
and c= the Y-intercept
X is the independent variable
Y is the dependent variable.
How do we obtain a “line of best fit”?
method of least squares
involves the minimization of the squared deviations between the actual scores of Y vs. those predicted by the resultant regression equation (Y’).