Linear Regression Flashcards
What is a simple linear regression?
It predicts ONE variable from another
Can a significant value for a coefficient (p < .05) tell us about magnititude and effect?
NO. That tell us whether estaimtes are significantly different from ZERO but not about magnitude of effect.
What does a p value really tell us in a coefficient table?
If predicted outcome variable is significantly different to zero, more than just by chance. It’s just a YES or NO.
It does not tell us about magnitude of this effect though.
In the coefficient output, for a simple linear regression, when we are looking to see what is the value of Y when X is 0, we are looking at the intercept. What coefficient value do we look to?
Unstandardised B next to the “intercept” word in the output.
The intercept is a CONSTANT value - remember this.
And, remember, constant unstandardised coefficient also tells us about the slope! so next to the "variable" beneath the "intercept" word, that figure tells us the direction of the relationship. Standardised beta (b) would give magnitiude of effect.
Remember our model has this formula – positive affect (outcome variable) is predicted by first, the intercept, which in this output is the CONSTANT UNSTANDARDISED B 2.853, that is the value of Y where X is 0. Sometimes its called b naught
In the coefficient output, for a simple linear regression, when we are looking to see what is the value of Y when X is 0, we are looking at the intercept. What coefficient value do we look to?
Unstandardised B next to the “intercept” word in the output.
The intercept is a CONSTANT value - remember this.
And, remember, constant unstandardised coefficient also tells us about the slope! so next to the "variable" beneath the "intercept" word, that figure tells us the direction of the relationship. Standardised beta (b) would give magnitiude of effect.
Remember our model has this formula – positive affect (outcome variable) is predicted by first, the intercept, which in this output is the CONSTANT UNSTANDARDISED B 2.853, that is the value of Y where X is 0. Sometimes its called b naught
Why is it better to use the standardised coefficient instead of unstandardised coefficient when looking at effect size?
Because standardised shows us for every 1 SD the predictor variable changes, X amount SD the outcome variable changes. As opposed to how many units change (Unstandardised). It helps to use SDs when comparing models as units will always be in SDs rather than arbritrary units depending on measures/variables
So, we can get effect sizes from standardised coefficients. How can we get effect sizes by examining variance?
Through looking at the r squared.
R squared indicated the proportion or percentage of total variance accounted for by the model.
What is R squared?
The squared CORRELATION between the ACTUAL DV scores and the PREDICTED DV scores.
Essentially it is the proportion of variance explained by the model.
What is another word for R?
Correlation
What is another word for R2?
Squared correlation
The variance of an outcome variable is 5. The regression tells us the variance of the residuals is 4. We then substract residual variance from total variance, which leads us to…?
The variance explained by the model.
The model explained variance.
R2. Squared correlation.
What does 0 in R2 indicate?
What does 1 in R2 indicate?
0 indicates NONE of the variance is explaiend by the model
1 indicates ALL of the variance is explained by the model
Do people consider a .25 r2 as small?
No.
.04 is considered small (4%)
.09 is medium (9%)
.25 is large (25%)
Are effect sizes with r squared definitive or are they t shirt?
T shirt. no set rules.
So what is adjusted r square?
As opposed to the r2 looking at proportion of variance explained by the model derived from data from a specific sample, ADJUSTED r square gives an estimate of the r2 in the population!
Meaning, how much variability would be explained if the MODEL was derived from the population rather than the sample.
It is more conservative.
Why would an adjusted r square be important - why can’t you just use the r squared provided on estimate for the population?
Because the regression model might overfit your particular data set. Therefore it may not work as well with other samples as it does with YOUR data.
Why can r2 be expected to vary?
Because sample correlations vary around the population correlation
True or false: the sample becomes less represenative as the sample size decreases
TRUE
What is sampling error?
The discrepancy between sample and population
Why does sampling error increase as sample size decreases AND as the number of predictors increase ?
Regarding predictors, becuase there is error associated with each predictor
And because the sample isnt representative of the population, smaller sample size no good
The R2 is likely to overestimate the size of the effect because:
A. Sampling error decreases as sample size decreases
B. Sampling error increases as sample size decreases
C. Sampling error increases as the number of predictors increase
D. B and C
D
Regression chooses the ____ therefore it is prone to overfitting the data
Best fit
What is failure to replicate the r2 called?
Shrinkage
How is shrinkage best evaluated?
Cross Validation Study
If there is a large effect size in a regression model, does this mean the same model will represent that effect size in a different model?
NO. Because an effect size from one sample is overestimated because of overfitting isues. Therefore next model you run on a new sample might have smaller effect size. Or one that is not statistically signicant.
What does it mean if another model is run on a new sample, that has a smaller effect size to the previous model?
SHRINKAGE
What does a cross validation study assess?
How generalisable your model is
Estimated shrinkage in your r2 value and adjusted r2 value
Is small data or large data going to highlight odd things about data?
Small data
What is word to describe a large discrepancy between r2 and adjusted r2?
Shrinkage
What does shrinkage indicate?
The regression model does not generalise well to the population
A difference of about 0.5% between r2 and adjusted r2 is probably acceptable. The larger the difference is, the less our model will _____
generalise
More than a few percent (+3%) between r2 and adjusted r2 is considered unaccpetable
Are there guidelines about how much shrinkage is too much?
No
Some people argue that shrinkage can be evaluated as ___% being acceptable if r2 is .50. What is the advantage of this?
5%.
More leeway for shrinkage.
What is the most useful effect size to be looking at for multiple regression?
F2! Unique variance for predictors.
How did f2 come about?
It’s based on R2. IT tells us the unique effect of a variable on the outcome.
The f2 gives an effect size for the proportion of residual variance explained - for unique effects of a predictor
When an overall model explains a lot of variance, would we expect the effect size for the same amount of unique variance go down or up?
UP.
SO, f2 gives an effect size for the PROPORTION of residual variance explained.
What is the difference between linear regression and multiple regression?
If two or more explanatory variables have a linear relationship with the dependent variable, the regression is called a multiple linear regression.
Linear is one predictor variable with a dependent variable.
When we run a regression, we hope to be able to _____ the ____ model to the _____?
Generalise the sample model to the population
Why do assumptions need to be met in order to be able to generalise to the population?
Because violating these assumptions can affect how well the regression model fits the data and how well the regression model can be generalised .Calls into question the validity