Linear regression Flashcards
In linear regression, what is the notation used to represent the intercept and the slope (regression coefficient), respectively (based on sample data)
a and b
In a regression equation, what does Yi denote?
Observed scores on the DV
In a regression equation, what does Xi denote?
Observed scores on the IV
In a regression equation, what does Y(hat) denote?
Predicted scores on the DV
In a regression equation, what does ei denote?
Residual scores in the regression model (ie difference between observed and predicted scores)
In regression analysis, what does OLS stand for?
Ordinary Least Squares
Ordinary Least Square (OLS) estimates are biased T/F
FALSE
They are unbiased
In a regression equation, what does p denote?
Number of partial regression coefficients
this applies in multiple regression analysis, where you have multiple IVs
Ordinary Least Square (OLS) estimates are very efficient T/F
TRUE
What metric do we use to calculate the strength of prediction of our overall regression model?
R2
What does r2 actually tell us
The proportion of the variance accounted for by our regression model
What is the range of possible values for r squared?
0-1
To calculate the confidence interval on r squared, you need the upper and lower degrees of freedom associated with the F statistic, T/F?
TRUE
R squared is biased and consistent, T/F
TRUE
The bias means you need to get the adjusted R squared too
If you have lots of IVs and a small sample size, what should you do to that r2
Adjust it
So you want to compare the strength of two partial regression coefficients. What are your two options?
- STANDARDISE IT
2. Use a SEMI PARTIAL CORRELATION
How can you tell you are looking at an R output containing standardised regression coefficients?
There will be no intercept presented
When looking at standardised regression coefficients, what is the unit they are expressed in?
SDs
What are you looking at when you’re looking at (squared) semi partial correlation?
The proportion of the variance in the DV explained by the IV you have isolated, assuming all other IVs are held constant
‘This IV uniquely accounts for x % of variation in the DV’
When commenting on the proportion of the variance that a given IV accounts for in the DV, should you used semipartial correlation or SQUARED semipartial correlation?
SQUARED!
What are the four statistical assumptions underlying the linear regression model?
- Independence of scores
- Linearity (of what?)
- Homoscedasticity (constant variance of residuals)
- Normality of residual scores
In the context of assumptions for regression models… what does LINEARITY refer to
The assumption that scores on the DV are a linear function of scores on the IV
In the context of assumptions for regression models… what does HOMOSCEDASTICITY refer to
It means…
The variance of the residual scores… is the same for any score on each dependent variable
In the context of regression analysis, how do we check for LINEARITY …
and what are we looking for when we do the looking
- Scatterplot matrix of… ALL DVs and IVs
- Scatterplot matrix of… residual scores and observed IVs AND residual scores and predicted IVs
- Marginal model plots of… scores on DV and observed IVs AND predicted IV
We are looking for straight lines
In the context of regression analysis, how do we check for HOMOSCEDASTICITY …
and what are we looking for when we do the looking
- Scatterplot matrix of… residuals and observed IVs
- Scatterplot matrix of… residuals and predicted IVs
And in both cases, we’re looking for a shaft not a triangle
- Bruesch-Pagan test
In which case, we’re looking for a large P value
In the context of assumptions for regression models… what does NORMALITY OF RESIDUALS refer to
Residual scores are normally distributed
In the context of regression analysis, how do we check for NORMALITY OF RESIDUALS …
and what are we looking for when we do the looking
The usual ways… histograms, qqplots, boxplots
What happens if one the four assumptions for regression analysis are violated?
It will fuck with your CIs
When do we use a Breusch-Pagan test?
In the context of regression analysis, when checking your HOMOSCEDASTICITY assumption (constancy of variance of residuals)
When doing a Breusch-Pagan test, what are we actually looking for
The p value, and we want it to be LARGE
What is the Bruesch-Pagan test actually applied to? (3)
- each IV alone
- all IVs together
- the regression model itself
What’s the R function for Bruesch-Pagan?
ncvTest()
In the context of regression analysis, how do we look for outliers
Using Studentized residuals
When do we use STUDENTISED RESIDUALS
When checking for outliers (in the context of regression analysis)
What counts as a large score STUDENTISED RESIDUALS
3 in large to moderate samples
maybe 3.5 to 4 in smaller samples
In the context of regression analysis, how do we look for influential cases
Using COOK’s d!
What do we use Cook’s d for?
When checking for influential cases (in the context of regression analysis)
What counts as a large score Cook’s d?
Anything more than 1
In regression notation, what does k represent
Number of IVs
How do you calculate degrees of freedom for a linear regression?
df = n - k - 1
If you have a linear regression with three IVs and 100 observations, how many degrees of freedom do you have?
df = n - k - 1
df = 100 - 3 - 1
df = 96
If your degrees of freedom decreases, what happens to your (unadjusted) r squared value?
It must increase…
hence the need for adjusting…
What does adjusting r squared do for you?
Compensates for the reduced power of the model that occurs when you have low degrees of freedom