Linear Regression Flashcards
What is a simple linear regression?
It predicts ONE variable from another
Can a significant value for a coefficient (p < .05) tell us about magnititude and effect?
NO. That tell us whether estaimtes are significantly different from ZERO but not about magnitude of effect.
What does a p value really tell us in a coefficient table?
If predicted outcome variable is significantly different to zero, more than just by chance. It’s just a YES or NO.
It does not tell us about magnitude of this effect though.
In the coefficient output, for a simple linear regression, when we are looking to see what is the value of Y when X is 0, we are looking at the intercept. What coefficient value do we look to?
Unstandardised B next to the “intercept” word in the output.
The intercept is a CONSTANT value - remember this.
And, remember, constant unstandardised coefficient also tells us about the slope! so next to the "variable" beneath the "intercept" word, that figure tells us the direction of the relationship. Standardised beta (b) would give magnitiude of effect.
Remember our model has this formula – positive affect (outcome variable) is predicted by first, the intercept, which in this output is the CONSTANT UNSTANDARDISED B 2.853, that is the value of Y where X is 0. Sometimes its called b naught
In the coefficient output, for a simple linear regression, when we are looking to see what is the value of Y when X is 0, we are looking at the intercept. What coefficient value do we look to?
Unstandardised B next to the “intercept” word in the output.
The intercept is a CONSTANT value - remember this.
And, remember, constant unstandardised coefficient also tells us about the slope! so next to the "variable" beneath the "intercept" word, that figure tells us the direction of the relationship. Standardised beta (b) would give magnitiude of effect.
Remember our model has this formula – positive affect (outcome variable) is predicted by first, the intercept, which in this output is the CONSTANT UNSTANDARDISED B 2.853, that is the value of Y where X is 0. Sometimes its called b naught
Why is it better to use the standardised coefficient instead of unstandardised coefficient when looking at effect size?
Because standardised shows us for every 1 SD the predictor variable changes, X amount SD the outcome variable changes. As opposed to how many units change (Unstandardised). It helps to use SDs when comparing models as units will always be in SDs rather than arbritrary units depending on measures/variables
So, we can get effect sizes from standardised coefficients. How can we get effect sizes by examining variance?
Through looking at the r squared.
R squared indicated the proportion or percentage of total variance accounted for by the model.
What is R squared?
The squared CORRELATION between the ACTUAL DV scores and the PREDICTED DV scores.
Essentially it is the proportion of variance explained by the model.
What is another word for R?
Correlation
What is another word for R2?
Squared correlation
The variance of an outcome variable is 5. The regression tells us the variance of the residuals is 4. We then substract residual variance from total variance, which leads us to…?
The variance explained by the model.
The model explained variance.
R2. Squared correlation.
What does 0 in R2 indicate?
What does 1 in R2 indicate?
0 indicates NONE of the variance is explaiend by the model
1 indicates ALL of the variance is explained by the model
Do people consider a .25 r2 as small?
No.
.04 is considered small (4%)
.09 is medium (9%)
.25 is large (25%)
Are effect sizes with r squared definitive or are they t shirt?
T shirt. no set rules.
So what is adjusted r square?
As opposed to the r2 looking at proportion of variance explained by the model derived from data from a specific sample, ADJUSTED r square gives an estimate of the r2 in the population!
Meaning, how much variability would be explained if the MODEL was derived from the population rather than the sample.
It is more conservative.
Why would an adjusted r square be important - why can’t you just use the r squared provided on estimate for the population?
Because the regression model might overfit your particular data set. Therefore it may not work as well with other samples as it does with YOUR data.
Why can r2 be expected to vary?
Because sample correlations vary around the population correlation
True or false: the sample becomes less represenative as the sample size decreases
TRUE
What is sampling error?
The discrepancy between sample and population
Why does sampling error increase as sample size decreases AND as the number of predictors increase ?
Regarding predictors, becuase there is error associated with each predictor
And because the sample isnt representative of the population, smaller sample size no good
The R2 is likely to overestimate the size of the effect because:
A. Sampling error decreases as sample size decreases
B. Sampling error increases as sample size decreases
C. Sampling error increases as the number of predictors increase
D. B and C
D
Regression chooses the ____ therefore it is prone to overfitting the data
Best fit
What is failure to replicate the r2 called?
Shrinkage
How is shrinkage best evaluated?
Cross Validation Study
If there is a large effect size in a regression model, does this mean the same model will represent that effect size in a different model?
NO. Because an effect size from one sample is overestimated because of overfitting isues. Therefore next model you run on a new sample might have smaller effect size. Or one that is not statistically signicant.
What does it mean if another model is run on a new sample, that has a smaller effect size to the previous model?
SHRINKAGE
What does a cross validation study assess?
How generalisable your model is
Estimated shrinkage in your r2 value and adjusted r2 value
Is small data or large data going to highlight odd things about data?
Small data
What is word to describe a large discrepancy between r2 and adjusted r2?
Shrinkage
What does shrinkage indicate?
The regression model does not generalise well to the population
A difference of about 0.5% between r2 and adjusted r2 is probably acceptable. The larger the difference is, the less our model will _____
generalise
More than a few percent (+3%) between r2 and adjusted r2 is considered unaccpetable
Are there guidelines about how much shrinkage is too much?
No
Some people argue that shrinkage can be evaluated as ___% being acceptable if r2 is .50. What is the advantage of this?
5%.
More leeway for shrinkage.
What is the most useful effect size to be looking at for multiple regression?
F2! Unique variance for predictors.
How did f2 come about?
It’s based on R2. IT tells us the unique effect of a variable on the outcome.
The f2 gives an effect size for the proportion of residual variance explained - for unique effects of a predictor
When an overall model explains a lot of variance, would we expect the effect size for the same amount of unique variance go down or up?
UP.
SO, f2 gives an effect size for the PROPORTION of residual variance explained.
What is the difference between linear regression and multiple regression?
If two or more explanatory variables have a linear relationship with the dependent variable, the regression is called a multiple linear regression.
Linear is one predictor variable with a dependent variable.
When we run a regression, we hope to be able to _____ the ____ model to the _____?
Generalise the sample model to the population
Why do assumptions need to be met in order to be able to generalise to the population?
Because violating these assumptions can affect how well the regression model fits the data and how well the regression model can be generalised .Calls into question the validity
Does population mean everybody or just the people you are interested in?
The latter
What are the assumptions in a regression that we have to meet?
L.I.N.E
What do regression plots show us?
If data is linear or not
What is a partial regression plot and why is it important in multiple regression?
(residuals vs. predicted)
It’s a plot of fitted values not about prediction.
If you have multiple predictors in your model, you may just want to look at one.
BUT This plot is NOT the effect of X on Y. This is just a plot of FITTED VALUES.
This is the plot of the fitted values on the X axis. On the Y axis we have the residuals, here showing ranges for regular residuals and standardised residuals (error), how far away observed data points are from line of best fit.
On the residuals vs predicted plot, is the data standardised?
Yes
WHy do you not want to see the data fitting a line when looking at residuals in partial regression plot?
We want to see a horizontal line going through 0 on the Y axis to see if residual data points are randomly dispersed around horizontal axis. Which means model probably OK for linearity.
IF there was a clear pattern in the residuals vs. predicted, what could you do to deal with this?
Try transforming the data with Log etc.
Why is it not usually a great idea to transform data with Log in terms of what this would mean for the variable in the model?
Because the model would then be about the transformed variable than the actual one. And there may not be a significant relationship between untrasnformed predictor and outcome variable. So perhaps look at a different test to linear regression.
The tricky assumptions for regression THAT involvec the Error terms are
Independent errors
Normally distributed errors
and Homoscedasticity
How do we check this?
- To check error terms are not correlated, you can look at the correlation coefficient I think mentioned in correlations the other week…
- For normally distributed error, look at partial regression plot (residuals vs. predicted) and see if there is an absence of a pattern. Want them to be random, normally distributed with mean of 0.
BUT the predictors do NOT need to be normally distributed, just improves chances of this assumption being met - Homoscedasticity - for each value or level of the predictor, variance of error term is constant (residuals should have the SAME variance). This is looked at visuallt
True or false: predictors have to be normally distributed for assumption of normal distribution being met
False - know this
Why is it that when assessing assumptions of error and checking normally distributed errors visually, why won’t we see the error term data not perfectly fitting a straight line?
Because it’s an error term, which represents what is left over therefore it won’t be exactly on the line or an error term would be zero. The closer the better, yes, but each individual data point has its own error.
The linear equation has error term with subscript i. I’s out come (Y) is = the intercept, slope, plus i’s resisual for error.
Can histograms and QQ plots show us about whether errors are normally distributed?
Yes
What do we expect to see when looking at homoscedasiticty of residuals?
That at each level of the predictor, resisuals have the same variance. So equal vairances. Heteroscedasticity = unequal
The residuals at value X should be the ____ at each level of the predictor to meet asumption for homoscedasticity?
SAME
What do the residuals do?
They tell us how good the model is and if it’s modelling what we think it is modelling.
What are predicted values?
The estimated outcome values
What are residuals?
The deviations of observed from predicted values
Why are standardised predicted and residual values preferred?
They allow for easier interpretation (why?)
The most commonly reported statistics for a multiple regression, historically, was:
unstandardised coefficient, the standard error, and the p-value
What are the most commonly reported statistics for multiple regression now?
STANDARDISED coefficient, AND confidence intervals instead of the standard error
What does a smaller confidence interval indicate/
A more precise estimate of the true population value, whereas a wide CI indicates more uncertainty above the true value, usually due to sample sie.
What is the common threshold for 95% CI?
1 - .05
What is preferred to be reported in results for multiple regression?
A. unstandardised coefficient
B. standardised coefficient
C. standard error
D. confidence intervl and standardised coefficient
D
For categorical predictors, is it best to report standardised or unstandardised beta?
Unstandardised (B) because 1 unit = difference between male and female as an example.
Whereas Beta (b) which refers to 1 SD change, is more difficult to show difference between male and female
A multiple regression is used to predict values of an outcome from ____ predictors
several
Is it always best to report standardised (b) effect sizes?
No. Unstandardised (B) can be useful for categorical predictors due to 1 unit change indicating more hepful information than 1 unit change.
Why will each X variable’s coefficient in a multiple regression equation be different?
Because in a MR, each coefficient is adjusted for all the other predictors in the model. MR takes into account variance explained by other IVs in the model.
Why, in the equation for multiple regression, is the e(i) not included for prediction?
Because when formulating the prediction equation we are PREDICTING, and using the data we can not predict what someones error term would be because we don’t know it - it does not exist and we are just using a regression line to predict the score. We are not observing a score and seeing how far off the model is - we did that with observed data. So this is prediction.
What does Y hat (triangle above) refer to?
Predicted score. It means it wasn’t something observed, but something you think will happen if you use the model you hope it predicted. But Y(i) is actual observed scores.
Why can’t we compare the unstandardised coefficients (B) to see which predictor has most influence on the outcome variable?
SCALE issue.
Because unstandardised coefficients do not have a mean of zero, and are no z score versions therefore can not compare directly. As predictors might be measured on different scales, standardised coefficients (beta values) are the z score version of B, which means they all have the same mean of zero and a SD of one, so can compare them directly
Why can we compare the standardised regression coefficient directly and therefore see which predictor has the biggest effect on the DV?
Because the standardised co efficient data values are Z score version of the b values (the unstandardised values) Which all have the same mean (0). Standard deviation, in other words, is 1. So can compare directly.
WHen looking at a coefficient table, which is the data to look at regarding seeing which variable is the best predictor of the DV in the model?
The standardised coefficient. IT tells us how much outcome changes for each increase in 1 SD OF THE PREDICTOR.
What does multicolliniearity show us?
When two predictor variables overlap alot, so HIGH intercorrelations between preditor variables.
If you have a VIF more than 10, something is multicollinear
Tolerances
What would a high VIF (over 10) suggest?
That perhaps these variables, given their overlap, are measuring something similar conceptually
Why might we include multiple indicators or variables into a latent or combined variable?
Multiple ways of measuring the same conceptual thing can give rich data, and help explain some construct with more varaince.
What does a squared semi partial correlation (sr2) tell us?
The proportion of variability in the outcome uniquely accounted for by that predictor. So, just like correlations, its is the unique shared vriance of that predictor taking into account shared variance of the other predictor.
What do we use to calculte f2?
sr2 and r2
If your sample size is small in a regression model, and there are quite a few predictor variables, why is this problematic?
Eah time you had a predictor variable you decrease your degrees o freedom. This means your r2 value will appraoch 1, and r2 appraoching 1 meansyour model will explain 100% of the variance. But our model will probably fitting noise. It might just be a model specific to your sample and not generalisable to the population
Just like Pearson correlations are tested for significance, regression equations should also be tested for signifiance to show if predictions are significantly better than chance. How do we compute this?
By computing an F ratio. A
Just like Pearson correlations are tested for significance, regression equations should also be tested for signifiance to show if predictions are significantly better than chance. How do we compute this?
By computing an F ratio.
What does a significant F ratio indicate?
That the equation predicts a significant proportion of the variability in the Y scores (i.e more than would be expected by chance alone)
It examines whether total model variance accounted for is significantly greater than 0.
True or false: Total variance is ALWAYS greater than 0 because R2 cannot be less than 0, but greater than 0 does NOT indicate significance
True
The ratio of variance accounted for by the regression model DIVIDED by the ratio of the total model variance is also known as..
R squared
When looking at the ANOVA table, what does the F ratio show?
The F ratio compares the variance predicted by the model with the variance that’s left over. The residual variance. Or the variance not predicted by the model.
The F ratio in an ANOVA table, when looking at the output, is calculated using mean squares (MS). How are mean squares calculated?
By taking the sum of squares and dividing my degees of freedom.
True or false: we expect our model, sum of squares, to be much greater than residual or error sum of squares
TRUE. Because this means a good model
When looking at the output for a regression, if we had to find out if the model overall resulted in a signfiicantly better prediction of X than simply using the mean, what would we look at?
The ANOVA table - specifically the F ratio and the P value.
What makes up total variance in data of a whole model?
Whatevr variance the model we made explains AND variance in data that is leftover.
What is another word for variance in data leftover for the model
residual variance
If after running a regression model there is a better prediction than just using the mean, what would we expect the models sum of squares to be in comparison to the resiual or error sum of squares?
Expect model sum of squares to be GREATER than residual sum of squares
If we looked at the anova table output and wanted to see if it was a good model, what would we hope to see when looking at the sum of squares?
That it is greater than the error sum of squares
Do the outcomes in regression have to be continous?
Yes
Can both predictors be categorical in a regression?
Yes
If all of your predictors are categorical, you should use ANOVA. True or False.
True
Predictor variables in a multiple regession should be:
A. Continuous
B. Dichotomous/Binary (0,1)
C. Recoded if categorical
D. All of the above
D
is gender a nominal or ordinal categorical variable?
Nominal
What should you do with a nominal variable like gender in a regression?
Dummy code. 0 and 1. If 1 refers to woman, then first variable recoding is Woman YES NO. Then Man, YES NO
If dummy coding for gender, and 1 refers to a woman, what would the first variable recording be?
Woman YES NO
When dummy coding for gender, how many variables do you create?
Three. one is reference group, and two is man or woman. These will go into multiple predictors.