Lecture 2 Flashcards
What is the Pearson Correlation r?
Pearson’s correlation coefficient r is a standardized index of the linear relationship between 2 continuous (or dichotomous*) variables
- In case of 2 dichotomous variabels (= nominal variables with only 2 categories) r algebraic equal to Φ
Assumptions of Pearson correlation r
Scores of the X and Y variables:
1. are quantitative (or both dichotomous)
2. are linearly related
3. have a bivariate normal distribution
4. do not have extreme outliers
5. Homoscedasticity: Y-scores have the variance across levels of X (and vice versa)
Influence of outliers
Distributional assumptions
Homoscedasticity/heteroscedasticity
Computation of Pearson’s r
Strength of Pearon’s r is related to patterns of the data
It is helpful to divide a X,Y scatterplot in 4 quadrants
We can divide the data points in:
◦ Concordant data points:data points that lie above the mean of both X and Y or
below the mean of both X and Y
◦ Discordant data points: data points that lie below the mean of X and above the
mean of Y, or above the mean of X and below the mean of Y
‘The cross’
Factors that can influence Pearson’s r
- Data patterns in X,Y plot (see above)
- Biased sample selection
- Restricted Range
- Selection of Extreme Groups - Correlations of samples with combined groups
- Extent to which r is controlled by other variables (more on this later at regression)
- Bivariate outliers
- Different shapes of distribution of X and Y
- Curvilinear or nonlinear relationships
- Transformation of data (e.g. log)
- Attenuation as a result of unreliability of measurement*; unreliable measurements weaken the correlation between such measurements
- Artificial part-whole correlations
- Aggregated data
Pearson’s rxy and causal conclusions
Large r can have many causes
We can only conclude that‘X is the cause of Y’if we can eliminate all these alternative explanations; non-experimental design does not fully allow this
Therefore, a larger correlation between X and Y does not provide sufficient
proof of a causal relationship!
Some reasons for a large Pearson r xy
- X causes Y
- Y causes X
- X causes Z, and Z causes Y; variable Z mediates effect of X on Y
- X is related to another variable Z, and Z causes Y
- X and Y measure the same construct
- X and Y are both influenced by Z
- Chance (sampling error)