hierarchal regression Flashcards
1
Q
R2
A
- This explains how much variance in our dependent variable is explained by our regression model
- Regression model refers to all the predictors considered together
- The value of R2 can range between 0 and 1, and the higher it is the more accurate the regression model is. It is often referred to as a percentage
- R2 = .7 means 70% of variance is explained
- R2 =.05 means 5% of variance is explained
- R2 =.21 means 21% of variance is explained
R2 =.006 means 0.6% of variance is explained
2
Q
adjusted R2
A
- Very similar to the R2 statistic.
- It is always lower- takes into account the number of predictors
- It punishes R2 for each predictor added to the model
This stops people throwing in more variables in order to improve the fit of the model
3
Q
regression coefficients
A
- Tells us the association between our IVs and the DV
- It can be positive (positive association) or negative (negative association) like a correlation.
- Unlike correlation coefficients it does not range between -1 and 1
- It is a description of an IV-DV association in terms of unit changes
- The regression coefficient number is the how much the DV changes when the IV is increased by one unit.
- I measure stress using a questionnaire (IV) and anxiety using a questionnaire (DV) and my regression coefficient is 1.5
- This would mean that for each increase of 1 on the stress questionnaire, scores on the anxiety questionnaire go up by 1.5.
4
Q
regression coefficients and standard error
A
- Regression coefficients always come with a standard error (SE)
- This is how precise your estimate (regression coefficient) is
- Big SE means it’s not precise
- Small SE means it’s precise
- A big regression coefficient and a small SE = significant effect.
- The p value is based on the proportion of the Regression coefficient to the SE
Rule of thumb: If the regression coefficient is twice the standard error it will be significant
5
Q
standardised regression coefficients
A
- β (beta) values
- You cannot directly compare regression coefficients and say that coefficient is bigger therefore it is a bigger effect
- This is because they are expressed in unit changes and the IVs are likely to be measured in different ways
○ DV – Anxiety
○ IV 1- whether you have done a regression class (1) or not (0)
○ IV 2 – Stress on a questionnaire scored from 0-50. - Using unstandardised regression coefficients (B) It is difficult to compare the effects of IV1 and IV2 on the DV. We could use standardised regression coefficients instead (β).
- The standardised regression coefficient is interpreted as for every one SD change in the IV the DV changes by the number of SDs the standardised regression coefficients indicates
○ β =0.5 means for every one standard deviation increase in the IV the DV increases by 0.5 standard deviations
β =-0.2 means for every one standard deviation increase in the IV the DV decreases by 0.2 standard deviations
6
Q
simple, multiple regression vs hierarchical regression
A
- Simple and multiple regression give us model fit and R squared which accounts for all the predictors in our model.
- Simple/Multiple regression is a simultaneous model
- Hierarchical models – there is some strategy (or specified hierarchy) which is dictated in advance by the purpose / logic of the research.
- Allows us to be more theory driven in our statistical analysis
- Allows us for adjust for variables
- Partitions our explained variance
○ put variables into a regression model to see how much the variance they explained and then put it back in with some more to see if it explains additional variance
7
Q
steps
A
- Groups of variables that we wish to look at as a distinct set of predictors
- Often people have variables they wish to adjust for in step one e.g. age and gender.
- You may wish to put questionnaire measures in one step and behavioural measures in others
Known predictors before hypothesised predictors
8
Q
R2 change
A
- Essentially this analysis will tell us the amount of variance that the first block predicts…
- ..and then the additional amount of variance subsequent blocks predict
- It is this concept of additional variance that matters, the variance block one has accounted for is ‘removed’ and the next block(s) can only predict residual variance (that not already accounted for by earlier blocks).
- If you add up the R2 changes gives the model R2 (the total variance explained) by all the variables together.
Note- there is no adjusted R2 change statistic
9
Q
F change
A
- This is simply an F test that enables you to ascertain if the additional amount of variance your block predicts is statistically significant.
- It is reported like any other F statistic with two degrees of freedom.
10
Q
choosing blocks
A
- Groups of variables that we wish to look at as a distinct set of predictors
- Often people have variables they wish to control for in block one e.g. age and sex.
- You may wish to put questionnaire measures in one block and behavioural measures in another
There’s no rules about what goes in a block - Blocks are used because of a theoretical rationale.
- You are interested in the contribution of certain groups of variables to the explained variance so they go in blocks together.
- For example, you may hypothesise that after controlling for age, gender and stress, job satisfaction predicts a significant amount of variance in work absence
11
Q
terminology
A
- Cumulative: Refers to the R2 Change and F change values for the blocks of the model, tells you how much additional variance each block adds to the model and whether this additional variance is significant.
Simultaneous: Refers to coefficients for all the variables in the regression equation when all considered together
12
Q
Cooks distance
A
- Residuals= The difference between observed and predicted values. I.e., how much people differ from the regression slope.
- If any outcome variables have high residuals, they may distort the accuracy of a regression.
- Cook’s distance tell us how predicted Y values will move on average if the data point is removed (a lot a of change indicates a large amount of influence)
- As with VIF, no accepted cut off
- > 1
- 4/n
- 4/(n – k – 1)
- 3 times larger than the mean
- K= number of independent variables
- N= number of participants
13
Q
assumptions
A
- Normally distributed (ish) residuals for continuous outcome.
- Independent data
- Interval/ratio predictors
- Nominal predictors with two categories (dichotomous)
- No multicollinearity
- Careful of influencing cases
- Linearity
Homogeneity of variance (heteroscedasticity)
14
Q
linearity
A
- A regression analysis assumes that there should be a linear relationship between the IVs and DV.
- We can use a residuals vs fitted graph to check this.
If there is a lack of a straight red line than there may be an issue.
15
Q
normality of residuals
A
- The residuals should be roughly normally distributed
- Not normally an issue with large samples
- Can assess this using a Q-Q plot.
If the residuals follow the line, then this suggests normality