Regression Flashcards

Question 1

Q

What is Multiple Regression?

Answer

A

An extension of bivariate regression and correlation.

Allows us to predict scores of a DV from a set of IVs.

Question 2

Q

Standard multiple regression:

Answer

A

All independent variables simultaneously enter the equation
Standard multiple regression is indicated when you simply intend to evaluate the interrelationships between variables and to obtain a multiple correlation.

Question 3

Q

The Multiple - Regression Equation:

Answer

A

Y_i = (b₀ + b₁X₁_i + b₂X₂_i + … + b_nX_ni) + ε_i

Where b₀ is the Y-intercept (the Y value when all X scores equal 0) and the bs are the weightings or unstandardised regression coefficients by which the independent or X variables are multiplied.

As an example of how the equation works, imagine that we wished to predict the frequency of GP visits (GP VISITS) for chronic low back pain patients, using the independent variables: duration of the disorder (DURATION), age (AGE), and sex (SEX). Predicted GP VISITS are obtained as follows:

(GP VISITS) = b₀ + b₁(DURATION) + b₂(AGE)+ b₃(SEX)

A patient’s DURATION, AGE, and SEX scores are multiplied by the corresponding regression coefficients. The yielded values are then added together with the intercept value (A) to produce a predicted value for GP VISITS

Question 4

Q

Adjusted R²:

Answer

A

R² is known to be artificially inflated by large number of IVs in the model.
Adjusted R² has other advantages: (1) it adjusts for potential bias due to small sample size (smaller sample sizes have a knack of obtaining larger effects than they should!), and (2) it gives a projected R² value that you might expect from the population that the sample drew upon.

Question 5

Q

Considerations for Multi-Recessional analysis:

Answer

A

Check the independent variables for multicollinearity and singularity by means of an intercorrelation matrix. Multicollinearity is present if two independent variables correlate at .90 or above (-.90 is also evidence of multi-collinearity) and singularity is present if the correlation is perfect. In both circumstances, one variable is redundant because it carries virtually the same information as the other. Delete one variable from analysis or combine the variables.
Multiple regression is based upon the General Linear Model and data should be screened to ensure that the model’s assumptions are met. In particular, ensure that variables are normally distributed and that relationships between variables are linear.

Question 6

Q

Sequential multiple regression:

Answer

A

In sequential multiple regression, the researcher controls the entry order of the independent variables into the equation. Each independent variable’s predictive importance is based upon its non-overlapping (unique) contribution to the equation at the point of entry.

However, once additional IVs are entered, the contributions of the earlier entered IVs are corrected for overlap with these new IVs. So, it is possible to see a predictor as important in an early step, only to become non-significant in later steps due to redundancy with other IVs in the model.

Question 7

Q

Name two research questions that can be answered by multiple regression analysis.

Answer

A

Any two of the following would be sufficient:

How much variance do the IVs account for (in combination) in the DV? [Consult R² value]

What is the relative importance of each of the IVs in the model? [Compare beta weights]

Which IV contributes the most unique variance to prediction of the DV? [Check sr²]

How much improvement in the model occurs when we add an additional IV (or group of IVs)? [Check R² change]

Question 8

Q

Define singularity and explain how it should be dealt with.

Answer

A

Singularity occurs when two variables are perfectly correlated. It occurs in instances when two variables are measuring exactly the same construct (e.g., height in metres vs. height in inches).

Two variables that are singular have completely overlapping variance. Therefore, it would be redundant to include both measures in a regression analysis. Best approach is to only use one of the variables as an IV, and to exclude the other.

Question 9

Q

Calculation of Sample Size

Answer

A

Field (2009) gives two rules of thumb for determining the required sample size which assumes a medium effect size, an alpha level of .05 and a beta level of .20 (probability of a Type 2 error).

The first rule, N > 50 + 8m (m is the number of independent variables) is used for testing the significance of the multiple correlation coefficient (R); that is, it tests the overall model (combined predictiveness of multiple IVs).
The second rule, N > 104 + m, is used for testing individual predictors. For example, if you have 10 independent variables, you need 50 + (8)(10)=130 subjects to test the multiple correlation and 104+10=114 subjects to test the individual predictors.
In cases where you are interested in both the overall correlation and the individual predictors, you need to calculate N in both ways and choose the larger value.

Question 10

Q

Explain what the multiple correlation R represents.

Answer

A

Multiple correlation R literally represents the strength of the relationship between the DV and predicted scores on the DV.

However, as the predicted scores on the DV are calculated from scores on the IVs, multiple correlation R is also considered the strength of the relationship between the DV and the combined IVs.

Question 11

Q

Describe the difference in entry of the independent variables between standard multiple regression and sequential multiple regression.

Answer

A

IVs are entered all at the same time in standard multiple regression.

In sequential multiple regression, IVs are entered in blocks (based on theory, prior research and current research questions).

At a bare minimum, sequential multiple regression has two steps (step 1: some IVs entered, step 2: remaining IVs are entered).

Standard regression always has one step (step 1: all IVs entered).

Question 12

Q

Identify and explain the statistic that provides the unique variance explained by an IV for the DV in standard multiple regression.

Answer

A

Sr represents the unique relationship between IV and DV. sr² represents the unique variance that an IV contributes to prediction of the DV.

Question 13

Q

Show the standardised regression equation for three independent variables, and label all terms.

Answer

A

The unstandardized regression equation for bivariate regression is Y = bX + a. This can be expanded for a situation with 3 IVs (i.e., multiple regression context) as:

Y = b₁X₁ + b₂X₂ + b₃X₃ + a

Now, you’ll recall from the lecture that b-weights are unstandardized indicators of the relationship between an IV and the DV. The standardized indicator of this relationship is the Beta weight. So, our equation becomes:

Y = β₁X₁ + β₂X₂ + β₃X₃ + a

But, since a = 0 within the standardized regression equation, the equation is often simply written as:

Y = β₁X₁ + β₂X₂ + β₃X₃

This is why there is no Beta weight for the constant in the output (see below, note that the Beta weights are in the “Stand.Estimate” column).

Question 14

Q

Under what conditions do the Beta weights directly reflect the relative importance of the independent variables in multiple regression?

Answer

A

Under all conditions. Beta weights are used to compare the relative importance of each of the IVs in your regression model.

For a more meaningful task, let’s suppose that the question read: ‘Under what conditions do the B weights directly reflect the relative importance of the independent variables in multiple regression?’

The answer to this question is ‘when the IVs are measured on the same scale of measurement. B weights reflect the relative contribution of an IV for prediction of the DV, but it is also affected by scale of measurement (IVs with a larger scale of measurement, such as 1 – 100, tend to have smaller B weights than IVs with a small scale of measurement, such as 1-5).

Question 15

Q

Identify and explain the statistic that provides the unique variance explained by an IV for the DV in sequential multiple regression.

Answer

A

Sr² could be used in this context.

If an IV is entered by itself at a particular step in the sequential multiple regression, you may also use the R²change value to reflect the unique contribution of the IV when entered into the model at that point.

However, if multiple IVs are entered into the model at a given step, the R²* *change value represents their combined improvement in model R² and does not indicate their individual contributions (in which case, you’d resort to sr²).

Question 16

Q

Describe how R is evaluated for significance in multiple regression.

Answer

A

In order to work out the significance of R given: (1) size of R, (2) sample size, and (3) number of IVs, we must first convert R into an F value and test the significance of this F value.

In Jamovi, this process and the significance test is reported in our overall model test.

Question 17

Q

Why is R² adjusted in multiple regression, and how is this achieved?

Answer

A

R² is adjusted because it is subject to biases that distort the actual variance explained by the IVs for the DV.

First, in smaller samples, R² tends to be inflated.

Second, R² is also inflated as the number of IVs in the model increases.

So, adjusted R² corrects for these potential biases and provides a more accurate R² value.

Question 18

Q

If R² =0. 60, N=50 and k=8, determine the simpler Adjusted R² which corrects for number of IVs.

Answer

A

Adjusted R² = 1 – [(1 - R²)x(N – 1)]/(N – k – 1)

= 1 – [(1 - .6)x(50 – 1)/(50 – 8 – 1)
= 1 – [.4 x 49]/41
= 1 – 19.6/41
= 1 - .48
= .52.

Question 19

Q

Adjusted R² Formula:

Answer

A

adj R*2 = 1− ((1 − R²)(N − 1))/(N − k − 1)

Where k is the number of predictor variables and N is the number of cases.