Correlation Redux Flashcards
what is Pearson’s r?
tells you the relationship between two variables: strength (‘size of effect’) or direction [+ or -]
what the p value tell us?
if it’s sig diff from 0
what is a ‘small effect’ size?
± 0.1
what is a ‘medium effect’ size?
± 0.3
what is a ‘large effect’ size?
± 0.5
what is significance?
testing the null hypothesis that r = 0 (correlation value is zero)
when can we reject the null hypothesis? / suggest there’s a significant different
when p < 0.05 we can reject the null hypothesis
what is a positive correlation?
both variables increase together
the interpretation of correlation and causality:
- correlation = not sufficient evidence for causality between variables
- does not imply causation
- gives no indication of the direct of causality
what is a negative correlation?
one goes up, the other goes down
what does a correlation coefficient give you?
- How much the two variables vary together
- The further the score from zero, the stronger the relationship
Why can’t we infer causation from a correlation?
The relation between the two variables is often due to the each’s variables relation to the third
what can happen if we take account of this relationship with the third variable (‘control for’)?
the original relatonship disappears as it was spurious (not genuine)
what may the third variable be?
a cofounding variable
how can we examine third variables?
partial correlations
what are some issues with correlation?
- shape of the relationship
- outlier
- restricted ranges
- sample size
- reliability of measures
what should relationships be like?
linear relationship (similar to regression)
what does Pearson’s Correlation Coefficient measure?
linear relationships
what doesn’t Pearson’s Correlation Coefficient measure?
non-linear relationships
* correlations be non-linear but Pearson’s r can not pick this up/do anything about it
non-linear correlations
will reduce the correlation
* still a relationship, just a different shape
Outliers
Individual scores can reduce or enhance the ‘r’
* i.e. if you add a pair of scores -> ‘no relationship’ is turned in a small/medium relationship by one data point
* i.e. if you add a different pair of scores to the data set -> a weak but significant relationship may be removed by one data point
outliers can cut both ways, what does this mean?
can artificially enhance the relationship or reduce it
what do we do about outliers?
we tend to remove them from the data set
how can we fixed stunted relationship?
by having a wide range of scores on all variables to get a clear view of the relationship between your variables
what is stunted relationships?
when there is a restricted range in our scores collected (aka. may be a specific demographic), we fall into the danger of a stunted relationship