Pearson’s Correlation Flashcards
Correlations vs Scatterplots
Scatterplots allow us to visualize our data
Observe relationships between 2 variables
Correlations quantify relationships
How much of a relationship (strength)
The likelihood of finding the observed relationship in the sample, if there was no relationship in the population (significance)
Correlation: explaining variability
How much of a relationship?
Correlations measure how much of the variance in one variable can be explained by another variable
In other words: how much of the variance is explained by our model
Any variance not explained by our model is explained by other variables
Can determine whether our model explains a meaningful amount of variance
Variance explained by our model
Line through all data points
Smallest distance = “Line of best fit”
Uses “least squares” method
smaller distance = bigger effect
Models the relationship
(remember models are simplified versions of reality)
a perfect model
If our data were perfectly linear:
Data would all lie on the line of best fit
We could work out one variable as long as we know the other
(we will cover this in more detail in term 2)
Correlation – Effect Size
Correlation coefficient = size of the effect
No relationship: r = 0
Perfect relationship: r = +1 or r = -1
Pearson’s r
How much of a relationship?
Correlations measure how much of the variance in one variable can be explained by another variable
Hypothesis: there will be a positive relationship between hunger levels and time since last meal
Variance Explained
Precisely how much variance in our data is explained by our model?
Squaring the correlation coefficient (r) tells us this
Can convert this to percentage (%)
Pearson’s r = 1
100% of the variance in hunger levels is explained by the amount of time since the last meal
Pearson’s r = .89
79% of the variance in hunger levels is explained by the amount of time since the last meal
Pearson’s r = -.08
<1% of the variance in hunger levels is explained by the amount of time since the last meal
Statistical Significance
The likelihood of observing this relationship, if there’s no real relationship in the world
We tend to use 𝛼 = .05 in Psychology
Reject the null hypothesis if there is less than 5% likelihood
In other words: how likely is it that there really is a relationship in the population?
A bigger sample size is better able to reveal if the relationship is real
but significant relationships aren’t the same as strong relationships
Correlation – Summary
Correlations in a nutshell
Used to measure the strength of relationship between two variables
Tells us how much of the variance in one variable is explained by another variable
correlation coefficient (r) is the size of the effect in our sample
Value between 0 and +/- 1
+ values are positive relationships
- values are negative relationships
0 means no relationship
Squaring the correlation coefficient reveals what proportion of the variance is explained by the model
p-value is the likelihood of observing this relationship if there’s no real relationship in the population
Making an inference from our sample to the population
correlations in r
Formatting your data
Each participant on a separate row
Each variable in a separate column
Meaningful variable (column) labels
Save as .csv
Import into your R workspace
I’ll call my data “df”
Correlation Functions
You need 1 function to calculate Pearson’s correlation in R:
cor.test()