Week 3: Correlation Flashcards
A general approach is that our outcomes can be predicted by a model and what remains
is the error
The i in the general model shows
e.g., outcome 1 is equal to model plus error 1 and outcome 2 is equal to model plus error 2 and so on…
For correlation, the outcome is modelled by
scaling (multiplying by a constant) another variable
Equation of correlation
What does this equation of correlation mean and what does b1 mean? - (2)
‘the outcome for an entity is predicted from their score on the predictor variable plus some error’.
model is described by a parameter, b1, which in this context represents the relationship between the predictor variable (X) and the outcome.
If you have a 1 continous variable which meets assumtpion of parametric test then you can conduct a
pearson correlation or regression
Variance is a feature of outcome measurements we have obtained and we want to predict with a model that
captures the effect of the predictor variables we have manipulated or measured
Variance of a single variable represents the
average amount that the data cary from the mean
Variance is the standard deviation
squared (s squared)
Variance formula - (2)
xi minus average of all scores of pp which is squared and divided by total number of participants minus 1
done for each participant (sigma)
Variance is SD squared meaning that it captures the
average of the squared difference the outcome values from the mean of all outcomes (explaining what the formula of variance does)
Covariance gathers information on whether
one variable covarys with another
In covariance if we are interested whether 2 variables are related then interested whether changes in one variable are met with changes in other
therefore.. - (2)
when one variable deviates from its mean we
would expect the other variable to deviate from its mean in a similar way.
So, if one variable increases then the other, related variable, should also increase or even decrease at the same level.
The simplest way to look at whether 2 variables are associated is to look at whether they.. which means..
covary
look at the relationship between the 2 variables
If one variable covaries with another variable then it means these 2 variables are
related
To get SD from variance then you would
square root variance
What would you do in covariance formula in proper words? - (5)
- Calculate the error between the mean and each subject’s score for the first variable (x).
- Calculate the error between the mean and their score for the second variable (y).
- Multiply these error values.
- Add these values and you get the product deviations.
- The covariance is the average product deviations
Example of calculaitng covariance and what does answer tell you?
The answer ispositive: that tells us the x and y values tend to risetogether.
What does each element of covariance formula stand for? - (5)
X = the value of ‘x’ variable
Y = the value of ‘y’ variable
X(line) = mean of ‘x’ - e.g., green
Y(line) = mean of ‘y’ - e.g., blue
n = the number of items in the data set
covariance will be large when values below
the mean for one variable
What does a positive covariance indicate?
as one variable deviates from the mean, the other
variable deviates in the same direction.
What does this diagram show? - (5)
- Green line is average number of packetts bought
- Blue line is average number of adverts watchedVertical lines represent deviations/residuals between obsered variables and circles represent means
- There is a similar pattern of deviations of both variables as person’s score below mean for one variable then score is other variable is below mean too
- We know similarity we are seeing between two variables is calculating covariance = divide cross-product deviations( deviations of 2 variables) divided by number of observations minus 1
- We devide n-1 as unsure of true population mean and related to DF.
What does negative covariance indicate?
a negative covariance indicates that as one variable deviates from the mean (e.g. increases), the other deviates from the mean in the opposite direction (e.g. decreases).
What is the problem of covariance as a measure of the relationship between 2 variables? - (5)
dependent upon the units /scales of measurement used
So covariance is not a standardised measure
e.g., if 2 variables measured in miles and covariance is 4.25 then if we convert data to kilometres then we have to calculate covariance again and see it increases to 11.
Dependence of scale measurement is a problem as can not compare covariances in an objective way –> can not say whether covariance is large or small to another data unless both data sets measured in same units
So we need to STANDARDISE it.
What is the process of standardisaiton?
To overcome the problem of dependence on the measurement scale, we need to convert
the covariance into a standard set of units
How to standardise the covariance?
dividing by product of the standard deviations of both variables.
Formula of standardising covariance
Same formula of covariance but multipled of SD of x and SD of y
Formula of Pearson’s correlation coefficient, r
Example of calculating Pearson’s correlation coefficient, r - (5)
standard deviation for the number of adverts watched (sx)
was 1.67,
SD of number of packets of crisps bought (sy) was 2.92.
If we multiply these together we get 1.67 × 2.92 =
4.88.
.Now, all we need to do is take the covariance, which we calculated a few pages ago as being 4.25, and divide by these multiplied standard deviations.
This gives us r = 4.25/
4.88 = .87.
The standardised version of variance is the
correlational coefficient or Pearson’s r
Pearson’s R is … version of covariance meaning independent of units of measurement
standardised
What does correlation describe? - (2)
Describes a relationship between variables
If one variable increases, what happens to the other variable?
Pearson’s correlation coefficient r was also called the
product-moment correlation
Linear relationship and normally disturbed data and interval/ratio and continous data is assumed in
Pearson’s r correlation coefficient
Pearson Correlation Coefficient varies between
-1 and +1 (direction of relationship)
The larger the R value, the closer the values will
be with each other and the mean
The smaller R values indicate
there is unexplained variance in the data and results in the data points being more spread out.
What does these two graphs show? - (2)
- example of high negative correlation. The data points are close together and are close to the mean.
- On the other hand, the graph on the right shows a low positive correlation. The data points are more spread out and deviate more from the mean.
The Pearson Correlation Coefficient measures the strength of a relationhip
between one variable and another hence its use in calculating effect size
A Pearson’s correlation coefficient of +1 indicates
two variablesare perfectly positively correlated, so as one variable increases, the other increases by a
proportionate amount.
A Pearson’s correlation coefficient of -1 indicates
a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount.
Pearson’s r
+/- 0.1 means
small effect
Pearson’s r
+/- 0.3 means
medium effect
Pearson’s r
+/- 0.5 means
large effect
In Pearson’s correlation, we can test the hypothesis that - (2)
correlation coefficient is different from zero
(i.e., different from ‘no relationship’)
In Pearson’s correlation coefficient, we can test the hypothesis that the correlation is different from 0
If we find our observed coefficient was very unlikely to happen if there was no effect in population then gain confidence that
relationship that
we have observed is statistically meaningful.
. In the case of a correlation
coefficient we can test the hypothesis that the correlation is different from zero (i.e. different
from ‘no relationship’).
There are 2 ways to test this hypothesis
- Z scores
- T-statistic
from z scores we can know the probability of a given z score occuring if the distribution from which it comes is
normal
The problem with Pearson’s r with z scores is that it is known for sampling distribution to not be
normally distributed
There is one problem with z scores in Pearson’s r, which is that it is known to have a sampling distribution
that is not normally distributed.
- This can be fixed by adjusting r so sampling distribution is normal as follows:
Steps to calculate z score for Pearson’s r
The hypothesis that correlation coefficient is different from 0 can be tested using t statistic with N-2 DF
SPSS for Pearson’s correlation coefficient, r does not compute
confidence intervals in r
Confidence intervals tells us something about the
likely correlation in the population
Can calculate confidence intervals of Pearson’s correlation coefficient by transforming formula of CI