week 3 Multiple Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

general multiple regression equation

A

Yi=(b0+b1x1i+b2x2i+….+bnxni)+ei

b0=the Y intercept (where all x scores equal zero)

b’s are all unstandardised regression coefficients

x1, x2 etc are independent variables.

Note this is unstandardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

R2

A

in this instance, is technically multiple R2=the amount of variance predicted in th DV by using all of the IV’s.

R2=SSM/SST

R2 is the multiple correlation coefficient

Determines how well the linear regression equation fits the data.

r2= single coefficient of determination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

research uses of multiple regression analysis

A

1.combined predictive utility

How much variability in the DV can we explain by knowing scores of all the predictor values? Does knowing the predictor scores tell us anything meaningful about the DV or is it as good as chance?

2.Importance of the IV’s.

Which variable is the most contributing to the prediction of the DV?

Do we need all the IV’s or are a few just as good?

  1. Uniqueness. Have much unique (non-overlapping) does each variable explain?
  2. Can we improve the prediction of a DV by adding one or more IV’s to the equation (sequential multiple regression)?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard multiple regression versus sequential multiple regression

A

In standard MR, all IV’s are entered in the equation simultaneously. In sequential MR, the IV’s are added to the equation in specific stages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Evaluating predictive importance of the independent variables

A

To do this, need the beta weights. Beta weight=the standardised coefficient. It is a b weight converted via z score into standardised form. It is used for comparing relative iv contributions to the prediction of the dv.

E.g.. if beta =.53 then a one standard deviation increase in the iv results in a .53 standard deviation increase in the predicted value of the dv.

As iv increases by 1 standard deviation, the dv changes by

beta x the standard deviation of the dv.

As the beta values range from negative infinity to infinity, they cannot be meaningfully squared and CANNOT inform about exactly how much variance an iv uniquely contributes to the dv.

But if an IV’s beta value is greater than another Iv’s, the former is contributing more to the dv’s variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Significance testing for evaluating unique predictiveness

A

for individual IV predictors, the significance test evaluating the unique part of an iv’s association with a dv is usually done via a t test.

t= b weight/standard error.

the t-test can be applied using either b weight or beta weight. The t-test is only evaluating the unique predictiveness, the the full correlation should always be assessed to see if there is any overlapWhen reporting the results of the t test, the values of r and sr2 must be reported also.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multiple R

A

Multiple R (unsquared) ranges from 0 to 1. It is the correlation between the observed and predicted Y values. If R=1, all predicted Y values are all on the regression line, if R=0, there is no correlation.

Cannot change significance of R. To change significance of R,

need an F ratio, which tests the null hypoethesis that no linear relationship exists between the independent and dependent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SST

A

SST=SSM+SSR

SST=Σ(Y-Y-)2

SSM=Σ(Y^-Y-)2

SSR=Σ(Y-Y^)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

blank

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tolerance and R2A

A

eg Have iv’s A, B and C. and Dv Y.

R2 tells us how much variance in Y is explained by the combination of A, B and C,

To calculate the tolerance of variable A, do a new multiple regression, but A now becomes our dv, with B and C remaining as iv’s. New R2 is now R2A and 1-R2A=tolerance of A. Can repeat for other iv’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Adjusted R2

A

Adjusted R2 is a modifed measure of R2.It is deemed necessary to adjust R2 becuase of a couple of inherent biases, these are:

1.R2 tends to be erroneously inflated when we have a small sample size. As the sample size increases, we become more confident in R2’s accuracy.

and

2.R2 tends to be erroneously inflated with a large number of IV’s. (R2 will increase even if add more IV’s to the model which have no correlation at all to the DV!

Adjusted R2=1- (1-R2)(N-1)/(N-k-1)

If R2 is very close to adjusted R2 it is ok to make a judement using it, BUT if there is a big difference, R2 is likely to be misleading, in which case BOTH should be reported.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

dummy variable coding

A

Multiple regression is designed primarily for continuous data but can handle discrete data if converted into dichotomous variables. (note the dv must be continuous). If have more than 2 nominal data eg. Muslim, Christian, Jew, Other, these MUST first be broken down into dummy variable coding.

If have c number of categores, must creat c-1 new variables

(df=categories-1)

eg Muslim vs non Muslim

Jewish vs non Jewish

Christian vs non Christian

(note Other from previous is the same as non Muslim, non Jewish and nonChristian.)

eg. If originally, had other=0, muslim=1, jewish=2 and Christian=3, and now in the dummy coded version, 0=no and 1=yes.

Dummy coding eg:

Original Muslim Jewish Christian

0 0 0 0

1 1 0 0

2 0 1 0

3 0 0 1

3 0 0 1

2 0 1 0 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

rules for sample size in multiple regression

A

There are 2 general rules of thumb recommended for determing the appropriate sample size in multiple regression;

  1. when considering the overall multiple correlation;

N>/= 50 + 8m

m=number of IV’s in the model.

ie with 10 predictors, we would reuiqre a sample size of 50+80=130, or more.

  1. When considering the predictive influence of individual IV’s;

N>/=104 +m. If have 10 predictors, need 114 or more cases.

These are general rules of thumb, based on the idea that the IV’s are moderately correlated with the DV. If the correlation is much larger, arguably one could have a slightly smaller sample size, and if the correlations are much smaller, arguably one would need a much larger sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

multicollinearity and singularity

A

Multicollinearity exists where 2 or more IV’s are highly linearly related to each other.

When considering whether we have multicollinearity:

>.90=multicollinear

1=singularity

.70 coule be a strong relationship and similar informational yield.

Because SPSS finds it difficult to measure each iv’s unique contribution if there is strong multicollinearity, SPSS assesses whether or not there is a potential issue by giving values of;

1.tolerance

and

  1. Variance inflation factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

types of multiple regression

A
  1. standard
  2. sequential
17
Q

standard multiple regression

A

all iv’s enter the equation simultaneously.

we are interested in:

a) the overall multiple correlation (R)

and

b) the unique part of each iv’s association with the dv.(sr2) tells us how much R2would be reduced by, if that iv was removed from the equation.

18
Q

sequential multiple regression

A

Hierarchical multiple regression.IV’s are entered into the model in a hierarchy of blocks/steps/stages. The order of IV entry is determined by the research question and experiment. The factor we are most interested in is entered last.

19
Q

semi partial correlation (sr)

A

sr measures the unique relationship between iv and dv after controlling for;

correlation between iv and other iv’s

and

corrletaion between dv and other iv’s.

sr ranges from -1 to 1.

sr in spss is called part

20
Q

squared semi partial correlation (sr2)

A

sr2 is the amount of unique variance an iv explains in a dv.

21
Q

R2 change

A

in sequential multiple regression, R2 change tells us how much is added to R2 by adding an additional iv or collection of iv’s. This is the unique contribution of that iv (or the group of iv’s) at that point of entry.

an F test is done to assess the significance of the change in R2.

F(k1,(N-ktotal-1)) where k1=number of iv’s at that step, N=number of cases, ktotal= total number of iv’s enetered up to that point

eg,N=50

2 stage sequential multiple regression.

step 1. 2 iv’s added

step2. 3 iv’s added. to evaluate R2change

Fchange(3,44) (50-5-1=44)

22
Q

significance testing for multiple regression

A

The test of R is conducted to test the significance of multiple regression. Tests the null hypothesis that no relationship exists between the dependent and independent variables. This is assessed via an F ratio assessed via Anova.

F=MSM/MSR

MSM=SSM/dfM and MSR=SSR/dfR

F(k,(N-k-1)) degrees of freedom where k=number of IV’s=dfM

N-k-1=dfR

23
Q

tolerance

A

spss gives a tolerance value based on how correlated an iv is with other iv’s.

Any tolerance value <.3, is problematic and means that the results for an individual iv’scontribution will be biased and we cannot be confident in the obtained values. Consider removing the iv from the equation and re-check.

24
Q

variance inflation factor score

A

is another measure of collinearity.

It is a ratio of overall model variance to the variance if only the single iv is inclded.

Any variance inflation factor score>3 is problematic and consider removing the iv from the equation and re-running.

25
Q

How many IV’s?

A

Ideally want as few IV’s as can, but to be able to predict as much variance in the DV as possible. 40IV’s explaining 40% of variance is not actually better than 4 IV’s predicting 35%. The addition al benefit is not likely worth the mountain more of paperwork/survey scores etc etc.

26
Q

suppressor variable

A

Occasionally, we have an iv with a small sr2 value yet a large beta weight.This is a suppressor variable.It doesn’t directly account for much prediction in the dv but it makes other iv’s in the model better predictors.