Section 33-34, 36-37 Pearson r; Coefficient of Determination; Multiple Correlation Flashcards

1
Q

Pearson r

A

PEARSON *r* is simply the correlation coefficient that tells you the DIRECTION and STRENGTH of the correlation between datasets.

  • Thus, *r* can be POSITIVE (indicating a POSITIVE or DIRECT RELATIONSHIP) or NEGATIVE (indicating a NEGATIVE or INVERSE RELATIONSHIP).
  • r2, on the other hand, is ALWAYS POSITIVE and indicates the PERCENT of VARIANCE beyond random chance in a PREDICTED VARIABLE that is ACCOUNTED FOR using a given PREDICTOR.
    • EX. an r2 of 59% means that a PREDICTOR VARIABLE accounts for 59% over RANDOM CHANCE of the VARIANCE in the PREDICTED VARIABLE outcome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pearson r Computation

A

COMPUTATION of PEARSON *r* – As with many mathematical formulas, this one looks pretty daunting, but as always, it is simple when taken 1 STEP AT A TIME.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Coefficient of Determination (r 2)

A

The COEFFICIENT OF DETERMINATION (r 2) is used to INTERPRET PEARSON r.

  • Its symbol, r 2, explains how it is computed: simply square r. Thus, for a Pearson r = .60, r 2 = .36 (.60 x .60 = .36).

_INTERPRETATION of *r* 2_ – Looking at Table 1. Below, notice the differences among the scores on Variable X This is referred to as VARIANCE. There is also variance in the scores on Variable Y.

  • Q: What percentage of the variance on one variable is accounted for by the variance on the other?
  • If we are trying to predict Variable Y (which might be college GPAs) from Variable X (which might be SATs), the question might be phrased as follows: What percentage of the variance on Y is predicted by the variance on X?
  • THE ANSWER to the question is found by calculating r 2 and multiplying it by 100.
    • For the scores shown in Table 1, r = -.77
    • So r 2 x 100 = .77 x .77 = .59 x 100 = 59%
    • This result indicates that 59% (not 77%) of the variance on one variable is accounted for by the variance on the other.
    • In other words, in this example, Variable X accounted for 59% of the variance on Variable Y, which is 59% better than using a random process to make predictions.
      • IMPORTANT: If we can account for 59% of the variance, the remaining 41% (100%- 59%= 41 %) of the variance is not accounted for, so there is much room for improvement in our ability to predict.

Look at Table 2. to see how selected values of r, the corresponding values of r 2, translate to the percentage of variance accounted for and not accounted for.

  • Notice that small values of r shrink dramatically when converted to r 2, indicating that we should be very cautious when interpreting small values of r -- they are further from perfection than they might seem at first, showing VERY WEAK relationships.

​NOTE: The difference between r and r2:

  • r can be POSITIVE (indicating a POSITIVE or DIRECT RELATIONSHIP) or NEGATIVE (indicating a NEGATIVE or INVERSE RELATIONSHIP).
  • r2, on the other hand, is ALWAYS POSITIVE and indicates the PERCENT of VARIANCE beyond random chance in a PREDICTED VARIABLE that is ACCOUNTED FOR using a given PREDICTOR.
    • EX. an r2 of 59% means that a PREDICTOR VARIABLE accounts for 59% over RANDOM CHANCE of the VARIANCE in the PREDICTED VARIABLE outcome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple Correlation

A

MULTIPLE CORRELATION tells us to what extent TWO VARIABLES, in combination, PREDICT a THIRD VARIABLE.

  • Ex: You might want to know how well high school GPAs in combination with SAT SCORES predict college GPAs. In this instance, there are three variables and, thus, three scores per subject:
    • Variable 1: college GPAs (the variable being predicted) – called the CRITERION VARIABLE
    • Variable 2: high school GPAs (a predictor)
    • Variable 3: SAT Scores (a predictor)
  • In order to use the formula below, name the variable being predicted as Variable 1. It does not matter which of the others is named Variable 2 and which is named Variable 3.
  • Because there are three variables, THREE VALUES of PEARSON r should be computed, which will be identified with subscripts. Where, for example, r12 stands for the relationship (correlation) between Variables 1 and 2. Consider these values of r:
    • r12 =.55 (relationship between Variables 1 and 2)
      • How correlated high school GPAs are with college GPAs
    • rl3 = .44 (relationship between Variables 1 and 3)
      • How correlated SATs are with college GPAs.
    • r23 = .38 (relationship between Variables 2 and 3)
      • The extent to which the two predictors are correlated.
    • Remember, Variable 1 is the one being predicted.
    • The data shows that high school GPA is more highly correlated with College GPA than the SAT is.
    • The .38 correlation between the two predictors indicates that, to a modest extent, there is overlap between them.
    • In general, the greater the correlation between the two predictors, the smaller the increase obtained when using them in combination.
  • Anyway, you want to know the extent to which Variables 2 and 3, in combination, will PREDICT Variable 1.
  • To answer the question, compute a MULTIPLE CORRELATION COEFFICIENT, whose symbol is R. The formula for R (when the variable being predicted has been named Variable 1) is as follows:

NOTE: Differences Between R and R2:

  • _*R* can be POSITIVE_ (indicating a POSITIVE or DIRECT RELATIONSHIP) or NEGATIVE (indicating a NEGATIVE or INVERSE RELATIONSHIP).

R2, on the other hand, is ALWAYS POSITIVE and indicates the PERCENT of VARIANCE beyond random chance that a PREDICTED VARIABLE (College GPA) is ACCOUNTED FOR by MULTIPLE defined PREDICTORS (High School GPA and SAT scores).

​EX. an R2 of 59% means that the COMBINED PREDICTOR VARIABLES accounts for 59% over RANDOM CHANCE of the VARIANCE in the PREDICTED VARIABLE outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly