M1 - Multiple Regression Flashcards
Multiple regression can be used when IV's are 1.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ variables and DV's are 2.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ variables.
Multiple regression can be used to determine which variables are important for
3.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ and which 4.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ variable explains the most 5.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ variance in the 6.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ variable.
Options: 1. prediction; 2. ratio; 3. between groups; 4. discrimination; 5. continuous; 6. unique; 7. independent; 8. continuous or categorical; 9. dependent
1 - continuous or categorical 2 - continuous 3 - prediction 4 - independent 5 - unique 6 - dependent
When would hierarchical regression analysis be preferred over standard regression analysis? (Choose all that apply)
- The researcher is interested only in the change in R2 statistic. .
- The researcher would like to check whether the variance explained in a categorical DV is increased following the inclusion of several continuous IV’s. .
- The researcher wishes to determine the effects of one particular IV relative to all others. .
- The researcher is interested in examining the effects of three IV’s on the DV whilst controlling for the variance explained by two others. .
- The researcher is interested in semi-partial correlations and the overall R2..
- The researcher is interested only in the change in R2 statistic. .(as there is a change from step 1 of the hierarchy to step 2)
- The researcher is interested in examining the effects of three IV’s on the DV whilst controlling for the variance explained by two others. .
What is the semi partial correlation for an IV and how is it represented in SPSS output?
Semi-partial correlation explains the unique relationship between the IV and the DV controlling for other IVs in the model.
In SPSS it is reported in the Coefficients table under the ‘Part’ heading
In a scatterplot, what sort of graph indicates there is an issue with collinearity?
If the scatterplot graph shows any type of discernible pattern then there is a potential issue with collinearity
CLO 1
What is Multiple Regression?
What variables work in MR?
What four questions is MR useful in answering?
Multiple regression is statistical analysis that looks at the predictive relationships between variables where there are two or more IVs and a single continuous DV.
MR is useful in answering questions about
- combined effect of IVs on the variance of a DV (Multiple R2)
- relative strength of different predictors in contributing to DV variance
- the unique variance of each predictor (semi partial - sr2)
- predicting improvement (hierarchical regression)
CLO 2 - How do various methods of MR differ? (standard, hierarchical, stepwise)
Standard
- Forced entry of all variables at once - unconcerned about order
- research question does not specifically indicate variables influence in a certain way
- overall combined, relative important and unique importance of the IVs on the DV
Hierarchical
- IVs entered in blocks - order determined by research question
- overall combined, relative importance and unique importance of the IVs on the DV
- PLUS prediction improvement (additional variance explained after controlling for certain IVs in previous blocks)
Stepwise
* Statistical approach where IVs are entered based on statistical criteria eg p value of t test for each IV remove if p
CLO 3 - What are the five key pieces of output for overall model and how do they relate to my research question?
Key output for overall model
R - correlation between all IVs and DV
R2 - overall variance explained in the DV by all IVs
Adjusted R2 - adjusts R2 for sample size and # of predictors (greater number of predictors is likely to inflate R2)
F test - significance test for R2 and R2 change
R2 change - difference between steps (blocks) in explained variance for Hierarchical regression (prediction improvement questions)
CLO 3 - What are the five key pieces of output for individual predictors in model and how do they relate to my research question?
Key output for Individual predictors
b weight - unstandardised, the amount of change in the predictor for every 1 unit change in the DV
Beta weight - standardised b weight, can use to relate to each other. Useful for relative important of each predictors questions
r - zero order correlation between IV and DV
sr - used to determine sr2 - semi-partial correlation, unique variance explained
t test - significance test of b weight for individual predictors
CLO4 - What are the six key multiple regression assumptions?
1 - Normal distribution - the data is normally distributed - outliers not present
2 - Linearity - the relationship between the IV and the DV is linear
3 - Independence of Errors - residuals are not correlated (inflates SE’s –>CIs and sig tests)
4 - Homoscedasticity - variability in residuals should be constant at each level of predictor
5 - Singularity and multicollinearity - variables are not strongly or perfectly correlated
6 - Sample size - n is large enough to detect the effect
CLO4 How are the key MR assumptions and influencers checked?
1 - Normal distribution
1 - Normal distribution - the data is normally distributed - outliers not present
Overall model checks
- check residuals for outliers
- Standardised residuals outside of +/-3.29 is cause for concern
- Cooks distance > 1 suggests outiers may be present
- Leverage > 3*(k-1)/n suggests outliers may be present
Individual Case and predictor check
* Standardised DFBeta > +/-1 indicates substantial influence. Useful to see where influence occurs separately on each case at either intercept or b weight of the IV
CLO4 How are the key MR assumptions and influencers checked?
2 - Linearity
2 - Linearity - the relationship between the IV and the DV is linear
Assess through scatterplots of residuals and predicted scores
CLO4 How are the key MR assumptions and influencers checked?
3 - Independence of Errors
3 - Independence of Errors - residuals are not correlated
Violation inflates SE’s –> impacts CIs and sig tests
Check with Durbin Watson Test
- tests the serial correlation between residuals for adjacent cases in the dataset
- No correlation is desireable
range 0-4
<2 positive correlation –> increased type 1 error (underestimates SEs)
2 no correlation
>2 negative correlation –> increased type 2 error (overestimates SEs)
CLO4 How are the key MR assumptions and influencers checked?
4 - Homoscedasticity
4 - Homoscedasticity - variability in residuals should be constant at each level of predictor
check with scatterplot
range -2 to 2 is desirable for most cases
no discernible pattern is desireable
CLO4 How are the key MR assumptions and influencers checked?
6 - Sample size
Sample size needs to be large enough to detect an effect of the expected strength
Rules of thumb are based on moderate effect size
N>= 50 + 8k for overall model
N>=104 + k for individual predictors
these rules of thumb are likely to overestimate n required if the expected effect is large and underestimate n required if the expected effect is small
CLO4 How are the key MR assumptions and influencers checked?
5 - Singularity and multicollinearity -
5 - Singularity and multicollinearity - IVs are not strongly or perfectly correlated
perfect or strong correlations .8 or .9 suggests one of the variables is redundant
- inflates SE’s –> impacts CIs and sig tests
- results become less stable
Check bivariate correlations for simple relationships
For more complex relationships in MR - check Tolerance, and VIF (Variance Inflation Factor)
Tolerance - range 0-1
- higher scores desireable, indicates variable has a unique contribution
< .1 is serious (only 10% unique contribution)
< .2 is problematic
VIF = 1/Tolerance
> 10 serious
> 5 problematic
sqrt/VIF = amount SE is inflated by
ie VIF of 4 means SE is double the value of what it would be if r = 0