Correlation Flashcards
How can we assess the relationship/correlation between two variables?
Pictorially - Scatterplot. Useful when have wide range of scores or large sample size. Allows to see levels of association between variables
Numerically - Correlation coefficient
What happens when there is a negative association between two variables
Higher values on variable A corresponding to lower levels on variable B
As one variable deviates from the mean, the other deviates from the mean in the opposite direction.
What happens when there is a perfect positive association between two variables?
Higher values on variable A perfectly corresponding to higher values on variable B.
As one variable deviates from the mean, the other variable deviates in the same direction.
What happens when there is a perfect negative association between two variables?
Higher values on variable A perfectly correspond to lower values on variable B
What happens when there is no association between two variables?
Higher values on variable A corresponding to either high or low values on variable B
What happens when there is a non-linear association between two-variables?
- There is an association, but it is not linear
- E.g. if practice too much then performance decreases
What does the strength of a relationship refer to?
How closely bunched around the imaginary line the dots are in the scattergraph
What are the two axes on a scattergram called?
- Ordinate and abscissa
- Vertical and horizontal axis
- Y-axis and x-axis
What does the direction of a relationship refer to?
- If points upward from bottom left to right = positive relationship
- If points down from top left to bottom right = negative relationship
What does the form of a relationship refer to?
- Linear = Straight line cutting across dots fits data nicely
- Non-linear = If a curve fits better
Correlation coefficient
Numerical way of expressing a linear relationship
Tells you how strong the relationship is between variables
What are the steps for calculating Pearson’s r?
- Transform raw scores into z-scores
- Multiply the two z-scores for each participant
- Sum all of the products of the paired z-scores then divided the result by the number of cases - 1
Provide an example of a negative relationship from Pearson’s r
-0.61
Provide an example of no relationship from Pearson’s r
0.00
Provide an example of a small positive relationship from Pearson’s r
0.06
Pearson’s r forumla
The mean of the products of paired z-scores
Numerator = Sum of all the products of the paired z-scores Denominator = Number of cases - 1
Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation). Value of 0 implies no linear correlation. The stronger from 0, the stronger the relationship.
Which values can the Pearson’s r take?
From -1 to 1
Undefined Correlation
- When you assess the correlation between two variables and one of them is constant.
- Scatterplot will be perfectly horizontal
One-tailed
Hypotheses specify direction
Two-tailed
Hypotheses don’t specify direction
How do you report the results of correlation?
- Describe magnitude of relationship, direction, whether statistically significant
- E.g. “There was a strong, positive, statistically significant relationship between class test scores and number of revision scores; r(10) = 0.82, p = 0.001.
What happens when the relationship between two variables departs from linearity?
When it is u-shaped - Pearson’s r will underestimate the correlation between X and Y
When serious departure - Pearson’s r is not appropriate
What happens when you calculate the correlation for two variables using a restricted range of scores?
Correlation is…
- Reduced
- Inflated
What happens to the value of Pearson’s r when there are outliers?
Value of r will be inflated or reduced
When does a perfect positive correlation occur?
r = 1
X and Y distributions have exactly the same shape.
Skewed Distribution
A distribution where the most frequently occuring scores are clustered at one end of the scale
Not a symmetrical distribution
Kurtosis
Peakedness of a distribution
Distributions can vary in terms of how peaked it is.
What should you do if the assumptions for the test of inference regarding Pearson’s r are not satisfied?
- Ignore if violations are not severe
- Lower alpha level if not severe
- Use tests designed for data that are not interval if severe violations.
What happens to the correlation in a population in different hypotheses?
Alternative hypothesis = Correlation is different to 0
Null Hypothesis = 0
When Pearson’s r is being used to measure the degree of correlation between 2 variables in a sample. What assumptions need to be satisfied?
Data measured on interval scale
When Pearson’s r is being used to measure the degree of correlation between 2 variables in a population. What assumptions need to be satisfied?
- Each variable is normally distributed
- Each variable is normally distributed at all levels of the other variable.
Is there a difference between a population and a sample? Explain why.
Sample = Actual participants in study. Doesn’t represent the population at large.
Population = Broader group of people who you intend to generalise the results of the study.
What does a statistically significant correlation between two variables suggest?
Two variables are unlikely to be uncorrelated in the population.
Does not concern the strength of the correlation.
Question if you believe the correlation is real.
When do you look at r?
- To see the strength of the correlation
- Larger the sample size, the smaller the value of r you need in order to obtain statistical significance.
Cohen (1988)
Suggested the following values for the strength of a correlation coefficient.
.10 = Small correlation .30 = Moderate correlation .50 = Large correlation
R-squared
Another way to judge the size of a correlation.
Multiply r-squared by 100, obtain the proportion of variance that X and Y share in common.
Correlation of .40 means 16% of variance shared in common by X and Y.
Does a statistically significant result tell us that X has a strong effect on Y?
No
Does not say about what the effect is likely to be in the population.
Tells us that the effect is unlikely to be a null effect.
Does p=.03 mean that the null hypothesis has a 3% chance of being true and research hypothesis has a 97% of being correct?
No
P-value gives no information about the probabilities of the observed effects being correct.
What does a non-significant difference mean?
A null effect in the population is statistically consistent with the observed results
What is the best way to assess normality?
Histogram
Allows to see if deviates from normality
Name the factors which can affect the size of the correlation coefficient
- They hide the true nature of the relationship between the two variables being correlated, misleading the researcher
- Inverted U-shaped relationship
- Restricted range
- Outliers
- Shape of the X and Y distributions
When does a perfect negative correlation occur?
R = -1
Can only occur when X and Y distributions have exactly the same shape or when X and Y distributions are oppositely skewed.
Variance
The average amount that the data vary from the mean