Lecture 2 Flashcards

1
Q

What is the Pearson Correlation r?

A

Pearson’s correlation coefficient r is a standardized index of the linear relationship between 2 continuous (or dichotomous*) variables

  • In case of 2 dichotomous variabels (= nominal variables with only 2 categories) r algebraic equal to Φ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Assumptions of Pearson correlation r

A

Scores of the X and Y variables:
1. are quantitative (or both dichotomous)
2. are linearly related
3. have a bivariate normal distribution
4. do not have extreme outliers
5. Homoscedasticity: Y-scores have the variance across levels of X (and vice versa)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Influence of outliers

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Distributional assumptions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Homoscedasticity/heteroscedasticity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Computation of Pearson’s r

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Strength of Pearon’s r is related to patterns of the data

A

It is helpful to divide a X,Y scatterplot in 4 quadrants
We can divide the data points in:
◦ Concordant data points:data points that lie above the mean of both X and Y or
below the mean of both X and Y
◦ Discordant data points: data points that lie below the mean of X and above the
mean of Y, or above the mean of X and below the mean of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

‘The cross’

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Factors that can influence Pearson’s r

A
  1. Data patterns in X,Y plot (see above)
  2. Biased sample selection
    - Restricted Range
    - Selection of Extreme Groups
  3. Correlations of samples with combined groups
  4. Extent to which r is controlled by other variables (more on this later at regression)
  5. Bivariate outliers
  6. Different shapes of distribution of X and Y
  7. Curvilinear or nonlinear relationships
  8. Transformation of data (e.g. log)
  9. Attenuation as a result of unreliability of measurement*; unreliable measurements weaken the correlation between such measurements
  10. Artificial part-whole correlations
  11. Aggregated data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pearson’s rxy and causal conclusions

A

Large r can have many causes
We can only conclude that‘X is the cause of Y’if we can eliminate all these alternative explanations; non-experimental design does not fully allow this
Therefore, a larger correlation between X and Y does not provide sufficient
proof of a causal relationship!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Some reasons for a large Pearson r xy

A
  1. X causes Y
  2. Y causes X
  3. X causes Z, and Z causes Y; variable Z mediates effect of X on Y
  4. X is related to another variable Z, and Z causes Y
  5. X and Y measure the same construct
  6. X and Y are both influenced by Z
  7. Chance (sampling error)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly