Linear Associations Flashcards
describe R: (Pearson’s) correlation coefficient
- indicates the strength and direction of a linear association
- ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation)
- based on creating a “line of best fit” that minimizes the total squared distance from the line
- thus the direction of the distance doesn’t matter, but extreme points affect the line more
describe R2 : the square of the correlation
coefficient of determination
- answers the question, “how much of the variance in the outcome variable is explained by variance in the predictor variable?”
- R2 ranges from 0 if there is no correlation, to 1 if the correlation is perfect (either positive or negative)
describe the general rule of R2 and strength of association
Pearson correlation is based on the _____; it makes certain assumptions. If the assumptions are not met, the model will still run but the conclusions may be invalid.
Pearson correlation is based on the normal distribution; it makes certain assumptions. If the assumptions are not met, the model will still run but the conclusions may be invalid.
name the assumptions of the Pearson correlation
- Normality of variables
- data must be interval: non-interval data cannot be normal
- data must be centrally and symmetrically distributed with a single mode and neither too few or too many extreme values
- linearity of association
- associations must be monotome (not changing direction)
- the line of best fit through the scatterplot should be a nearly straight line and not a curve
- oval scatterplot
- the scatterplot should form an oval, not a triangle
describe rank correlations
- Pearson correlation minimizes the total squared distance from the line
- in contrast, Spearman’s rank ignore the size of differences:
- it doesn’t matter if one subject is a lot taller and heavier than another, or only a little bit
describe advantages and disadvantages for Pearson vs. Spearman
- Spearman’s rank is less statistically powerful than Pearson correlation
- statistical models (e.g. regression) use Pearson rather than Spearman
- S for Spearman for Safe
- P for Pearson for Powerful
summarize the difference between Pearson’s correlation and Spearman’s rank