3. Investigating Associations Flashcards
Investigating Associations
on 2 measurements of a single sample, two different things can be investigated: whether the mean is different and whether the two measurements are associated.
Scatter Plot
A Scatter plot is the most obvious way to compare data by plotting one against the other. This is easier to investigate an association between measurements.
Limitations to visual examination
an apparent association can be obtained just by chance by choosing points which seem to follow a straight line.
In order to determine whether an association is likely to be real, a statistical test needs to be carried out.
- Work out the correlation coefficient.
Correlation Coefficient (r)
This is a single measurement describing the strength and direction of an association.
- 1 -> Perfect negative association
+ 1 -> perfect positive association
0 -> no association
The further r is from 0, the larger the sample size, the less likely is the association due to chance.
Calculation:
- Calculate the means of x and y
- Add up the sum of all and divide by a scaling factor.
zy)i = (yi – ȳ) / s y -> calculate a standardized value for each yi. Add the products from the last step together
n - 1 -> n is the total number of points.
Calculating the Correlation Coefficient 1) Calculate the means of x and y, ẋ and ẏ. 2) Add up the sum of all (x –ẋ)(y –ẏ) and divide by a scaling factor √[(x - ẋ)2(y -ẏ)2]
Why does calculating the correlation coefficient work?
positive association:
all points will be either above and to the right of the centre of the distribution, or below to the left.
negative association:
all points will be either above and the the left of the centre of the distribution or below to the right.
no association:
points will be randomly scattered all around the centre of the distribution.
Positive and negative values of (x – ẋ)(y – ẏ) will cancel each other out.
Carrying out a test
- ) Construct a null hypothesis
- ) Calculate the correlation coefficient (r)
- ) Calculate probability of r being this big just by chance if there is no real association.
- ) If the probability if >0.05 there is a greater than 5% probability of this happening by chance. => no evidence to reject null hypothesis
If the probability is <0.05 there is a less than 5% probability of this happening by chance => evidence to reject null hypothesis - can say that two measurements are significantly correlated.
Cautions about using correlations
Correlation analysis cannot be used if one measurement is independent, meaning that it could not be altered by the other.
Example:
a) measuring cell mass at different times
b) If one variable is controlled by myself in an experiment - e.g. measuring the metabolic rate of cells at certain temperatures
If there is a correlation, it does not necessarily imply a casual relationship between the two sets of measurements. E.g. heart rate doesn’t control blood pressure.
The size of the correlation coefficient doesn’t reflect the slope of the relationship, just its closeness.
Correlation is only useful for linear relationships, so always look at graphs of the results first.