Correlation Flashcards
What is a correlation?
The bedrock of a regression analysis.
Definition: assesses the degree to which scores (from a set of respondents) on two variables co-relate (how a change in one variable affects the other)
The types of correlations:
3 kinds
- Pearson’s product-moment correlation coefficient (r)
- Spearman’s rank correlation coefficient (rs)
- Partial correlations - leads into regression
What is a correlation coefficient?
A correlation coefficient provides an index of the extent to which two variables are linearly correlated (i.e. the strength of the linear relationship between the two variables)
Illustrate, using an example, how correlation and correlation coefficients are different things…
Correlation - e.g. to what extent do depression and self-esteem co-relate, is there a linear relationship?
Correlation coefficient - e.g. to what extent is the relationship linear?
What is a scatterplot?
A scatterplot plots scores of one variable against scores of another, each point represents a participant. This gives an idea of the relationship between variables
What is a bivariate association?
It’s what we look at with correlations, the relationship between two variables
How do correlation analysis work? Outline
We work out the correlation coefficient, say what the probability is that we got this coefficient by chance assuming that the variables are not at all associated with each other –> we want this to be
PEARSONS:
Why do we want both variables on a common rubric?
How do we do this?
If the two variables are on different scales its like comparing apples and pears –> sam scale for easier comparison of variables
Instead of making a common scale for both variables we instead look at Z scores for each variable
PEARSONS:
What is a Z score?
A standardised score for variables to create a scale to compare them on
Z = (x-u) / o
Z = (observed value - mean) / standard deviation
PEARSONS:
What is a standard deviation?
Standard quantity expressing how much the values in a data set differ about the mean
PEARSONS:
What do we do with Z scores?
They are multiplied together and summed
E Zx Zy (sum of z scores on x axis and on y axis)
A perfect positive correlation has a Z sum of (max) N. This happens when the Zx and Zy sums are equal.
A perfect negative correlation has a Z sum of (min) -N. This happens when the Zx and Zy sums are equal but opposite in sign.
N is the sample size
So, when expressing the strength of the correlation/ its linearity (r), the Z sum is expressed as a ratio of max/min N.
r = sum of ZxZy / N
So when there’s a perfect positive correlation sum of ZxZy = n/n =1. When there’s a perfect negative correlation the sum of ZxZy = -n/n = -1.
PEARSONS:
What is r?
R is the correlation coefficient, which we want to be a close to 1 as possible.
Want the p value to be as low as possible - unlikely to have found results by chance assuming null hyp is true.
The degrees of freedom used here is N-2
PEARSONS:
Reporting the results of a Pearsons…
- Significant or not?
- Direction?
- Between? (state variables)
- r(df)=___, p=___, this means…
Use a table where possible. If something is reported in a table it doesn’t need to be reported in the text - won’t be negatively marked but its a waste of time.
PEARSONS:
Assumptions underlying a pearsons correlation?
How can these assumptions be violated?
What do you do if assumptions are violated?
Assumptions:
1. Variables must be measured on interval scales (equal spacing between intervals)
- Two variables must be linearly related
Violations:
- non-linear relationship
- outliers
- ordinal data (in order but not equally spaced)
What to do:
If the assumptions are violated then… a non-parametric correlation must be used, uses rank scores (Spearman’s rank (rs)
SPEARMAN’S:
Spearmans rank correlation coefficient/ Spearmans rho, when is it used?
This correlation can be used when the assumptions of a pearsons are violated - when the scatterplot shows a monotonic a curve but non-linear relationship (a curve) between variables and when there are outliers or data is ordinal