Multiple Regression Flashcards

• demonstrate an understanding of the similarities and differences between Simple and Multiple regression • demonstrate an understanding of the key statistical elements of Forced Entry Multiple Regression • demonstrate an understanding of the key statistical elements of Hierarchical Multiple Regression complete and interpret Multiple Regression analyses on SPSS

1
Q

what is the basis of multiple regression?

A

one outcome, multiple predictors
-> multiple variables (predictors) predict one outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is R-squared

A
  • amount of variance explained by the regression/model
  • correlation coefficient squared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is simple linear regression

A
  • built a model to explain the variance using an equation with one predictor
  • test how well variability of the scores is explained by the model (R^2)
  • significance of F: variance explained significant (not zero)
  • B1: slope, B0: intercept (constant)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how is multiple regression similar to simple regression?

A
  • builds a model to explain the variance using linear equation
  • test how well the variability of the scores is explained by the model
  • R^2: how much of the variance is explained by our model
  • significance of F: is the variance explained significant (not zero)
  • usually assumptions inc homoscedasticity and normal distributed residuals apply
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

BUT what is new for multiple regression?

A
  • using an equation with more than one predictor
  • examine how much each predictor contributes to predicting the variability of outcome measures (forced entry and hierarchy regression)
  • compare different models predicting the same outcome (hierarchical regression) and see which model predicts most of the variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

R^2

A

tells us the estimate for our sample
-> will naturally overestimate the ‘real’ R^2 (in the population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Adjusted R^2

A

estimate for the population (probably more accurate measure -> more likely to be accurate because it takes sample size into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why is R adjusted?

A
  • adjusted down to allow for the overestimation of R^2
    -> better reflection of the ‘real’ R^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does the adjustment relate to?

A

sample size
-> generally the bigger the sample size, the less need for adjustment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

should you report R^2 or adjusted?

A

report both
-> for simple regression as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what if F ratio?

A
  • we can test if our model accounts for a significant amount of the variance as we did before
  • it is the variance predicted by the model with all predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In multiple regression, a significant R squared tells us…

A
  • our model accounts for a significant amount of variance in the outcome
    -> the ratio of explained to unexplained variance is high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unlike multiple regression, in simple regression

A

You know what variable(s) predict the outcome from the R-squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The return of the B (characterises the relationship of a predictor)

A
  • get individual b’s for each of our predictors
  • relate to each other, because the other variables/predictors are taken into consideration as a control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does B do?

A
  • estimate of contribution while ‘controlling’ for other variables
  • have an estimate of how much each variable contributes on it own with other held constant -> similar to partial correlation
  • estimate of the individual contribution of each predictor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Multiple Regression

A

how much variance.. does the overall model with the number of predictors account for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

components in multiple regression

A
  • b0
  • more than one predictor i.e. b1(x1) [regression coefficient for predictor 1] + b2(x2) [regression coefficient for predictor 2] + bn(xn) [regression coefficient for predictor nth variable]..
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the issue with normal b’s?

A

affected by the distributions and type of score
-> can use them in an equation, but you can’t compare them especially if they are different measures and scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the solution to the B issue?

A

standardised (make beta weighted) -> by turning B into standard deviation
-> standardised score is simply the number of standard deviations from the standardised mean of the scores (above or below)
-> you can compare how much each predictor is contributing to the prediction
-> by standardising b, it allows us to compare the analysis and contribution of each variable to the outcome in terms of standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does b1 = 0.594 mean if beta is weighted?

A

as the predictor increases by one SD, the outcome increases by 0.594 of a standard deviation

-> Slope we can compare across different predictors
-> Beta telling us about the contribution of each individual predictor to the model - and usually they’re quite variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how can we test whether each predictor is significant from zero or not?

A

a T-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the output of a multiple regression?

A
  • each variable has an unstandardised (b) and standardised coefficient (beta or β)
  • t value derived from b
    -> associated p-value tells you if the coefficient estimate is significantly different from zero (tells us whether there’s a significant predictor in it)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does the unstandardised value allow you to do?

A

be used within any equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what does the standardised value allow us to do?

A

make comparisons across the predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what does negative β indicate?

A

negative relationship (even if it’s not significant)

26
Q

why can we not trust correlations?

A

estimate of the two variable relationship without other variables taken into consideration

27
Q

how to output your interpretations?

A
  • Extraversion b = 1.40, β = .594, t =6.95, p<.001 (extroversion a significant predictor of wellbeing)
  • Agreeableness b = -0.48, β = -.,018 t =-.222 p=.83 (agreeableness is not a significant predictor of wellbeing)

AND so on…
* telling us about individual contribution of predictors from the model

28
Q

what are b and β better at:

A

estimates of the contributions of individual predictors

29
Q

why can’t you trust correlations

A

they are just estimates of two variable relationship without the other variables taken into consideration
-> they are uncontrolled as they were (just an estimate of two variable relationship)
-> always do a regression because correlations won’t give you the answer that you need

30
Q

We are looking at how IQ, Age and working memory predict reading scores. This means:

A

We are looking for 3 beta weights from the analysis and the dfs for Ssreg in the overall ANOVA is 3.

31
Q

Reading Score = 22+(.03)IQ + (.06)Age + (-0.3) WM. What is the predicted reading score for a 19 year old with an IQ of 110 and WM score of 30 = 30

A

17.4

32
Q

A beta of -0.58 means what?

A

as every SD our variable increases, Y decreases by .58 of the SD

33
Q

What is Hierarchical Regression?

A

When we need to control for a third/important variable (i..e controlling for age while seeing if personality predicts wellbeing)

34
Q

How do we conduct Hierarchical Regression

A
  • add the variables into the equation in steps
    1. add the control variable in first -> making sure that you control any variance that might be described by this
  • examine R2 and its significant
    2. add the variables we are interested in -> incl the ones you want to control for and run the analysis again.

Giving you 2 models

35
Q

What are the two models and what are we looking for?

A
  • Age on its own (Model 1)
  • Age and Personality Traits (Model 2)

We are interested in the change of predictive power from Model 1 to 2 -> want to see if there’s a change in step 1 or two
-> We’re looking at the predictive power of model 2 and see if significantly better than the predictor power of model 1

36
Q

How can we compare both models?

A

using F ratio changes

  1. enter the first set of variables into the analysis for the first model -> get the R^2 and F ratio telling us how much variance is accounted for by the model
  2. next batch of variables are added in a second model
    -> R^2 and F ratio telling us how much variance is accounted by the model (which includes both sets of predictors)
  3. wan to compare models
    * We’re going to make a call on the F ratio and whether there’s a significant change -> i.e. the F change compares the models and tells us there is a significant improvement in variance explained in model 2 (model that explains significantly more variances)
    * We can see if more variance is explained in the second model compared to the first model
37
Q

ANOVA tables for hierarchical regression

A
  • each model has a separate ANOVA table which tells us if the variance is difference from zero for each separate model
  • does not compare the model but instead what you report for each individual model (the amount of variance accounted for zero)
38
Q

An example of a Hierarchical Regression

A

“A hierarchical regression was carried out with Age in step 1, and mean scores of the different personality scales; Extraversion, Agreeableness, Openness, Conscientiousness, and Neuroticism in step 2. A significant model was found at step 1, F(1,84) = 6.57, p <.05 and explained 7.3% of the variance. The inclusion of the five personality traits significantly increased the amount of variance explained to 52.6% (p<.001) and was a significant model, F(6,79)=14.61, p <.001…”
* Model 1 and 2 are significant
* Age predicts wellbeing
* The age and personality scales (as a group) predicts wellbeing
* Model 2 is significantly accounts for significantly more variance
* Addition of personality improve our ability to explain the variance

39
Q

what coefficients should you focus on?

A
  • if model 2 sig, focus your interpretation on this
  • when you find predictor predicts an outcome in model 1, but stops in model 2, this is something to focus on
40
Q

what are dummy variables?

A
  • you can introduce categorical variables into regression using dummy coding (0’s and 1’s)
41
Q

cases where you’d use dummy variables

A
  • traditional gender or experimental conditions
42
Q

what about the outcome numbers in dummy variables?

A

outcome numbers have to be continuous but your predictor does not have to be (can be 1’s and 2’s allowing us to code for 2 different things in our analysis -> can get a score which will help you predict an outcome)

43
Q

If the b/beta value is positive

A

category coded as ‘1’ is higher in the outcome variable than the category coded as ‘0’. (i.e. men are scoring higher than females)

44
Q

if the b/beta value is negative

A

category coded as ‘0’ is higher in the outcome variable (i.e. females are scoring higher than males)

45
Q

what is another word for multiple regression?

A

forced entry regression

46
Q

where can you find whether the F change is significant?

A

in the ANOVA table

47
Q

Male coded as 1 and Female coded as 0 in this analysis. A positive coefficient of 0.86, what would this mean?

A

men are scoring higher on this measure than females
* positive coefficient means males are scoring higher on Wellbeing (but it is not significant)

48
Q

A score of -.45

A

would mean that year one students (0) are happier than year 2 students (1)

49
Q

what are the assumptions with multiple regression?

A
  • Variable Type: Outcome must be continuous (Predictors can be continuous or discrete e.g. dummy variables).
  • Non-Zero Variance: Predictors must not have zero variance.
  • Independence: All values of the outcome should come from a different person or item.
  • Linearity: The relationship we model is, in reality, linear
  • Homoscedasticity: For each value of the predictors the variance of the error term should be constant.
  • Normally-distributed Errors: The residuals must be normally distributed
50
Q

What is a type of bias we need to be cautious of?

A

Mulitcollinearity

51
Q

Mulitcollinearity

A
  • exists when predictors are highly correlated with each other
    -> look for strong-medium correlations
52
Q

what are some issues with Mulitcollinearity

A

undermine your findings
* b1 can be unstable (vary across samples)
* difficult to say which predictor is important
* artificially reduces with R^2 -> and number of individual predictors when all correlated together

53
Q

how can the assumptions of Mulitcollinearity be checked?

A

collinearity diagnostics

54
Q

collinearity diagnostics

A
  • VIF is a measure of each predictors relationship with other predictors
  • want it to be a low as possible (tells you that your predictors are reasonably independent of one another
  • anything close to 10 is an issue / problematic
55
Q

how to calculate tolerance?

A

1 divided by the VIF
-> should be above 0.2

56
Q

which can affect our results

A

extreme outliers

57
Q

how can we check for extreme outliers

A

standardised residuals

58
Q

issue with very high standardised residuals?

A

actual score is very different from predicted

59
Q

how do we check for outliers?

A
  • just look/check for high standardised residuals (looking for min and max)
  • only about 5% should be over 2SD (if they are, that means outliers)
  • look in the residuals statistics table to see what the minimum and maximum residual are
60
Q

how many residuals should be over 2SD

A

5%

61
Q

how can we measure for outliers

A

cook’s distance

62
Q

cook’s distance

A
  • measures of influence of each cause has on the model
    -> most if not all cook’s distance should be below 1 (want cook’s distance to be as low as possible) - look in the maximum cook’s distance section in the residual statistics