L06 Correlations Flashcards
Correlations - what?
Associations between variables
How closely the data points fall to the line of best fit
4 measures of association
- Covariance
- Correlation:
- Pearson’s coefficient of correlation
- Variance accounted for (or explained) - Spearman’s coefficient of rank correlation
Covariance - formula
Mean of the Product of Deviations
Cov(X,Y) = sum of (xi-xbar)*(yi-ybar) / n
Unit: of x*y
Covariance - features
- Crude measure of correlation measured for calculating Pearson’s r
- Parametric test
- Unit depends on x and y: not standardized
- +/- higher value means stronger correlation
2 key tests of correlation
- Pearson’s coefficient of correlation
2. Spearman’s Rho Correlation Coefficient
Pearson’s Coefficient of Correlation - when?
Linear relationship
2 continuous variables
Spearman’s Rho Correlation Coefficient - when?
Correlation between ranked or ordinal data
Non-parametric; so no assumption of normality
No assumption of even spacing
Line of Best Fit
Added in a scatterplot
Line that best represents data
Correlation: how closely data points fall to this line
Regression: the characteristics or equation of the line
A good line of best fit - very small residuals for each point
Correlation of magnitude 1? 0?
1- Perfect (+/-) relationship
0 - no linear relationship
Pearson’s correlation coefficient symbol
r
Pearson’s correlation coefficient formula
r= Cov(X,Y) / sx sy where Cov(X,Y) = Covariance of X and Y sx and sy = standard deviations of the variables
Unit: no unit; standardized
Independent of overall variability
Pearson’s correlation coefficient features
Linear relationship between 2 continuous variables
r = covariance of the two divided by SDs
No unit; standardized
Because divided by SD, independent of overall variability (so it only shows how well-related the variables are and not how much they vary)
what is R squared
Variance accounted for
Variance accounted for
Amount of variance explained by the data
Expressed as % or a fraction
Rsquared - square of Pearson’s r
What does Rsquare close to 1 mean?
Variance accounted for near 1 -> almost all the variation in data is shared between variables/explained by data
Partial correlation
“Quantifies the relationship between two variables while accounting for the effects of a THIRD variable on BOTH variables in the original correlation”
The part of correlation that is not related to variable C
Semi-partial correlation
“Quantifies the relationship between two variables while accounting for the effects of a third variable on only ONE of the variables in the original correlation”
Variable C in a partial correlation is called
Variable that is partialized out or held constant
Zero order correlation
Another term for a simple bivariate correlation
as opposed to partial correlations
First or second order correlation
Partial correlation that partials out one/two variables
Is significance of correlation equal to importance?
Correlation can be significant but weak.
Strength and significance of correlation are not linked,
Why do we use correlations?
- May highlight possible causal relations (though doesn’t imply it)’
- Null effects / absence of correlations allow us to discount some theories
- presence: correlation -> investigate why: experiment
Biserial correlation
For when a dichotomy has an underlying continuum
One variable is categorical, other continuous
r is called a “point-biserial r”
Small, medium and large correlation values
+/- 0.1 small effect
+/- 0.3 medium effect
+/- 0.5 large effect