Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Multiple Regression?

A

An extension of bivariate regression and correlation.

Allows us to predict scores of a DV from a set of IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standard multiple regression:

A
  • All independent variables simultaneously enter the equation
  • Standard multiple regression is indicated when you simply intend to evaluate the interrelationships between variables and to obtain a multiple correlation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Multiple - Regression Equation:

A

Yi = (b0 + b1X1i + b2X2i + … + bnXni) + εi

Where b0 is the Y-intercept (the Y value when all X scores equal 0) and the bs are the weightings or unstandardised regression coefficients by which the independent or X variables are multiplied.

As an example of how the equation works, imagine that we wished to predict the frequency of GP visits (GP VISITS) for chronic low back pain patients, using the independent variables: duration of the disorder (DURATION), age (AGE), and sex (SEX). Predicted GP VISITS are obtained as follows:

(GP VISITS) = b0 + b 1(DURATION) + b 2(AGE)+ b 3(SEX)

A patient’s DURATION, AGE, and SEX scores are multiplied by the corresponding regression coefficients. The yielded values are then added together with the intercept value (A) to produce a predicted value for GP VISITS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Adjusted R2:

A
  • R2 is known to be artificially inflated by large number of IVs in the model.
  • Adjusted R2 has other advantages: (1) it adjusts for potential bias due to small sample size (smaller sample sizes have a knack of obtaining larger effects than they should!), and (2) it gives a projected R2 value that you might expect from the population that the sample drew upon.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Considerations for Multi-Recessional analysis:

A
  • Check the independent variables for multicollinearity and singularity by means of an intercorrelation matrix. Multicollinearity is present if two independent variables correlate at .90 or above (-.90 is also evidence of multi-collinearity) and singularity is present if the correlation is perfect. In both circumstances, one variable is redundant because it carries virtually the same information as the other. Delete one variable from analysis or combine the variables.
  • Multiple regression is based upon the General Linear Model and data should be screened to ensure that the model’s assumptions are met. In particular, ensure that variables are normally distributed and that relationships between variables are linear.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sequential multiple regression:

A

In sequential multiple regression, the researcher controls the entry order of the independent variables into the equation. Each independent variable’s predictive importance is based upon its non-overlapping (unique) contribution to the equation at the point of entry.

However, once additional IVs are entered, the contributions of the earlier entered IVs are corrected for overlap with these new IVs. So, it is possible to see a predictor as important in an early step, only to become non-significant in later steps due to redundancy with other IVs in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name two research questions that can be answered by multiple regression analysis.

A

Any two of the following would be sufficient:

How much variance do the IVs account for (in combination) in the DV? [Consult R2 value]

What is the relative importance of each of the IVs in the model? [Compare beta weights]

Which IV contributes the most unique variance to prediction of the DV? [Check sr2]

How much improvement in the model occurs when we add an additional IV (or group of IVs)? [Check R2 change]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define singularity and explain how it should be dealt with.

A

Singularity occurs when two variables are perfectly correlated. It occurs in instances when two variables are measuring exactly the same construct (e.g., height in metres vs. height in inches).

Two variables that are singular have completely overlapping variance. Therefore, it would be redundant to include both measures in a regression analysis. Best approach is to only use one of the variables as an IV, and to exclude the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Calculation of Sample Size

A

Field (2009) gives two rules of thumb for determining the required sample size which assumes a medium effect size, an alpha level of .05 and a beta level of .20 (probability of a Type 2 error).

  • The first rule, N > 50 + 8m (m is the number of independent variables) is used for testing the significance of the multiple correlation coefficient (R); that is, it tests the overall model (combined predictiveness of multiple IVs).
  • The second rule, N > 104 + m, is used for testing individual predictors. For example, if you have 10 independent variables, you need 50 + (8)(10)=130 subjects to test the multiple correlation and 104+10=114 subjects to test the individual predictors.
  • In cases where you are interested in both the overall correlation and the individual predictors, you need to calculate N in both ways and choose the larger value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain what the multiple correlation R represents.

A

Multiple correlation R literally represents the strength of the relationship between the DV and predicted scores on the DV.

However, as the predicted scores on the DV are calculated from scores on the IVs, multiple correlation R is also considered the strength of the relationship between the DV and the combined IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the difference in entry of the independent variables between standard multiple regression and sequential multiple regression.

A

IVs are entered all at the same time in standard multiple regression.

In sequential multiple regression, IVs are entered in blocks (based on theory, prior research and current research questions).

At a bare minimum, sequential multiple regression has two steps (step 1: some IVs entered, step 2: remaining IVs are entered).

Standard regression always has one step (step 1: all IVs entered).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Identify and explain the statistic that provides the unique variance explained by an IV for the DV in standard multiple regression.

A

Sr represents the unique relationship between IV and DV. sr2 represents the unique variance that an IV contributes to prediction of the DV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Show the standardised regression equation for three independent variables, and label all terms.

A

The unstandardized regression equation for bivariate regression is Y = bX + a. This can be expanded for a situation with 3 IVs (i.e., multiple regression context) as:

Y = b1X1 + b2X2 + b3X3 + a

Now, you’ll recall from the lecture that b-weights are unstandardized indicators of the relationship between an IV and the DV. The standardized indicator of this relationship is the Beta weight. So, our equation becomes:

Y = β1X1 + β 2X2 + β 3X3 + a

But, since a = 0 within the standardized regression equation, the equation is often simply written as:

Y = β1X1 + β 2X2 + β 3X3

This is why there is no Beta weight for the constant in the output (see below, note that the Beta weights are in the “Stand.Estimate” column).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Under what conditions do the Beta weights directly reflect the relative importance of the independent variables in multiple regression?

A

Under all conditions. Beta weights are used to compare the relative importance of each of the IVs in your regression model.

For a more meaningful task, let’s suppose that the question read: ‘Under what conditions do the B weights directly reflect the relative importance of the independent variables in multiple regression?’

The answer to this question is ‘when the IVs are measured on the same scale of measurement. B weights reflect the relative contribution of an IV for prediction of the DV, but it is also affected by scale of measurement (IVs with a larger scale of measurement, such as 1 – 100, tend to have smaller B weights than IVs with a small scale of measurement, such as 1-5).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Identify and explain the statistic that provides the unique variance explained by an IV for the DV in sequential multiple regression.

A

Sr2 could be used in this context.

If an IV is entered by itself at a particular step in the sequential multiple regression, you may also use the R2change value to reflect the unique contribution of the IV when entered into the model at that point.

However, if multiple IVs are entered into the model at a given step, the R2* *change value represents their combined improvement in model R2 and does not indicate their individual contributions (in which case, you’d resort to sr2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe how R is evaluated for significance in multiple regression.

A

In order to work out the significance of R given: (1) size of R, (2) sample size, and (3) number of IVs, we must first convert R into an F value and test the significance of this F value.

In Jamovi, this process and the significance test is reported in our overall model test.

17
Q

Why is R2 adjusted in multiple regression, and how is this achieved?

A

R2 is adjusted because it is subject to biases that distort the actual variance explained by the IVs for the DV.

First, in smaller samples, R2 tends to be inflated.

Second, R2 is also inflated as the number of IVs in the model increases.

So, adjusted R2 corrects for these potential biases and provides a more accurate R2 value.

18
Q

If R2 =0. 60, N=50 and k=8, determine the simpler Adjusted R2 which corrects for number of IVs.

A

Adjusted R2 = 1 – [(1 - R2)x(N – 1)]/(N – k – 1)

= 1 – [(1 - .6)x(50 – 1)/(50 – 8 – 1)
= 1 – [.4 x 49]/41
= 1 – 19.6/41
= 1 - .48
= .52.

19
Q

Adjusted R2 Formula:

A

adj R*2 = 1− ((1 − R2)(N − 1))/(Nk − 1)

Where k is the number of predictor variables and N is the number of cases.