in this instance, is technically multiple R2=the amount of variance predicted in th DV by using all of the IV's. R2=SSM/SST R2 is the multiple correlation coefficient Determines how well the linear regression equation fits the data. r2= single coefficient of determination

SST=SSM+SSR SST=Σ(Y-Y-)2 SSM=Σ(Y^-Y-)2 SSR=Σ(Y-Y^)2

week 3 Multiple Regression Flashcards by Nina de Goede

general multiple regression equation

Y_i=(b₀+b₁x_1i+b₂x_2i+….+b_nx_ni)+e_i

b₀=the Y intercept (where all x scores equal zero)

b’s are all unstandardised regression coefficients

x₁, x₂ etc are independent variables.

Note this is unstandardised

How well did you know this?

Not at all

Perfectly

R²

in this instance, is technically multiple R²=the amount of variance predicted in th DV by using all of the IV’s.

R²=SS_M/SS_T

R² is the multiple correlation coefficient

Determines how well the linear regression equation fits the data.

r²= single coefficient of determination

How well did you know this?

Not at all

Perfectly

research uses of multiple regression analysis

1.combined predictive utility

How much variability in the DV can we explain by knowing scores of all the predictor values? Does knowing the predictor scores tell us anything meaningful about the DV or is it as good as chance?

2.Importance of the IV’s.

Which variable is the most contributing to the prediction of the DV?

Do we need all the IV’s or are a few just as good?

Uniqueness. Have much unique (non-overlapping) does each variable explain?
Can we improve the prediction of a DV by adding one or more IV’s to the equation (sequential multiple regression)?

How well did you know this?

Not at all

Perfectly

Standard multiple regression versus sequential multiple regression

In standard MR, all IV’s are entered in the equation simultaneously. In sequential MR, the IV’s are added to the equation in specific stages.

How well did you know this?

Not at all

Perfectly

Evaluating predictive importance of the independent variables

To do this, need the beta weights. Beta weight=the standardised coefficient. It is a b weight converted via z score into standardised form. It is used for comparing relative iv contributions to the prediction of the dv.

E.g.. if beta =.53 then a one standard deviation increase in the iv results in a .53 standard deviation increase in the predicted value of the dv.

As iv increases by 1 standard deviation, the dv changes by

beta x the standard deviation of the dv.

As the beta values range from negative infinity to infinity, they cannot be meaningfully squared and CANNOT inform about exactly how much variance an iv uniquely contributes to the dv.

But if an IV’s beta value is greater than another Iv’s, the former is contributing more to the dv’s variance.

How well did you know this?

Not at all

Perfectly

Significance testing for evaluating unique predictiveness

for individual IV predictors, the significance test evaluating the unique part of an iv’s association with a dv is usually done via a t test.

t= b weight/standard error.

the t-test can be applied using either b weight or beta weight. The t-test is only evaluating the unique predictiveness, the the full correlation should always be assessed to see if there is any overlapWhen reporting the results of the t test, the values of r and sr² must be reported also.

How well did you know this?

Not at all

Perfectly

Multiple R

Multiple R (unsquared) ranges from 0 to 1. It is the correlation between the observed and predicted Y values. If R=1, all predicted Y values are all on the regression line, if R=0, there is no correlation.

Cannot change significance of R. To change significance of R,

need an F ratio, which tests the null hypoethesis that no linear relationship exists between the independent and dependent variables.

How well did you know this?

Not at all

Perfectly

SS_T

SS_T=SS_M+SS_R

SS_T=Σ(Y-Y^-)²

SS_M=Σ(Y^-Y^-)²

SS_R=Σ(Y-Y^)²

How well did you know this?

Not at all

Perfectly

blank

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Tolerance and R²_A

eg Have iv’s A, B and C. and Dv Y.

R² tells us how much variance in Y is explained by the combination of A, B and C,

To calculate the tolerance of variable A, do a new multiple regression, but A now becomes our dv, with B and C remaining as iv’s. New R²is now R²_A and 1-R²_A=tolerance of A. Can repeat for other iv’s.

How well did you know this?

Not at all

Perfectly

Adjusted R²

Adjusted R² is a modifed measure of R².It is deemed necessary to adjust R² becuase of a couple of inherent biases, these are:

1.R² tends to be erroneously inflated when we have a small sample size. As the sample size increases, we become more confident in R²’s accuracy.

and

2.R² tends to be erroneously inflated with a large number of IV’s. (R² will increase even if add more IV’s to the model which have no correlation at all to the DV!

Adjusted R²=1- (1-R²)(N-1)/(N-k-1)

If R² is very close to adjusted R² it is ok to make a judement using it, BUT if there is a big difference, R² is likely to be misleading, in which case BOTH should be reported.

How well did you know this?

Not at all

Perfectly

dummy variable coding

Multiple regression is designed primarily for continuous data but can handle discrete data if converted into dichotomous variables. (note the dv must be continuous). If have more than 2 nominal data eg. Muslim, Christian, Jew, Other, these MUST first be broken down into dummy variable coding.

If have c number of categores, must creat c-1 new variables

(df=categories-1)

eg Muslim vs non Muslim

Jewish vs non Jewish

Christian vs non Christian

(note Other from previous is the same as non Muslim, non Jewish and nonChristian.)

eg. If originally, had other=0, muslim=1, jewish=2 and Christian=3, and now in the dummy coded version, 0=no and 1=yes.

Dummy coding eg:

Original Muslim Jewish Christian

0 0 0 0

1 1 0 0

2 0 1 0

3 0 0 1

2 0 1 0 etc

How well did you know this?

Not at all

Perfectly

rules for sample size in multiple regression

There are 2 general rules of thumb recommended for determing the appropriate sample size in multiple regression;

when considering the overall multiple correlation;

N>/= 50 + 8m

m=number of IV’s in the model.

ie with 10 predictors, we would reuiqre a sample size of 50+80=130, or more.

When considering the predictive influence of individual IV’s;

N>/=104 +m. If have 10 predictors, need 114 or more cases.

These are general rules of thumb, based on the idea that the IV’s are moderately correlated with the DV. If the correlation is much larger, arguably one could have a slightly smaller sample size, and if the correlations are much smaller, arguably one would need a much larger sample size.

How well did you know this?

Not at all

Perfectly

multicollinearity and singularity

Multicollinearity exists where 2 or more IV’s are highly linearly related to each other.

When considering whether we have multicollinearity:

>.90=multicollinear

1=singularity

.70 coule be a strong relationship and similar informational yield.

Because SPSS finds it difficult to measure each iv’s unique contribution if there is strong multicollinearity, SPSS assesses whether or not there is a potential issue by giving values of;

1.tolerance

and

Variance inflation factor

How well did you know this?

Not at all

Perfectly

types of multiple regression

Study These Flashcards

standard
sequential

standard multiple regression

Study These Flashcards

all iv’s enter the equation simultaneously.

we are interested in:

a) the overall multiple correlation (R)

and

b) the unique part of each iv’s association with the dv.(sr²) tells us how much R²would be reduced by, if that iv was removed from the equation.

sequential multiple regression

Study These Flashcards

Hierarchical multiple regression.IV’s are entered into the model in a hierarchy of blocks/steps/stages. The order of IV entry is determined by the research question and experiment. The factor we are most interested in is entered last.

semi partial correlation (sr)

Study These Flashcards

sr measures the unique relationship between iv and dv after controlling for;

correlation between iv and other iv’s

and

corrletaion between dv and other iv’s.

sr ranges from -1 to 1.

sr in spss is called part

squared semi partial correlation (sr²)

Study These Flashcards

sr² is the amount of unique variance an iv explains in a dv.

R² change

Study These Flashcards

in sequential multiple regression, R² change tells us how much is added to R² by adding an additional iv or collection of iv’s. This is the unique contribution of that iv (or the group of iv’s) at that point of entry.

an F test is done to assess the significance of the change in R².

F(k₁_,(N-k_total-1)) where k₁=number of iv’s at that step, N=number of cases, k_total= total number of iv’s enetered up to that point

eg,N=50

2 stage sequential multiple regression.

step 1. 2 iv’s added

step2. 3 iv’s added. to evaluate R²change

F_change(3,44) (50-5-1=44)

significance testing for multiple regression

Study These Flashcards

The test of R is conducted to test the significance of multiple regression. Tests the null hypothesis that no relationship exists between the dependent and independent variables. This is assessed via an F ratio assessed via Anova.

F=MS_M/MS_R

MS_M=SS_M/df_M and MS_R=SS_R/df_R

F_(k,(N-k-1))degrees of freedom where k=number of IV’s=df_M

N-k-1=df_R

tolerance

Study These Flashcards

spss gives a tolerance value based on how correlated an iv is with other iv’s.

Any tolerance value <.3, is problematic and means that the results for an individual iv’scontribution will be biased and we cannot be confident in the obtained values. Consider removing the iv from the equation and re-check.

variance inflation factor score

Study These Flashcards

is another measure of collinearity.

It is a ratio of overall model variance to the variance if only the single iv is inclded.

Any variance inflation factor score>3 is problematic and consider removing the iv from the equation and re-running.

How many IV's?

Ideally want as few IV's as can, but to be able to predict as much variance in the DV as possible. 40IV's explaining 40% of variance is not actually better than 4 IV's predicting 35%. The addition al benefit is not likely worth the mountain more of paperwork/survey scores etc etc.

suppressor variable

Occasionally, we have an iv with a small sr² value yet a large beta weight.This is a suppressor variable.It doesn't directly account for much prediction in the dv but it makes other iv's in the model better predictors.

week 3 Multiple Regression Flashcards

(26 cards)