Term 2: Lecture 8 Correlations and Linear Regression Flashcards
Relationships between variables: Whats the difference between
Association and correlation
what are they?
what levels of data are they appropriate for?
Association:
• When two variables are related to one another, in the sense that they vary together
• appropriate for nominal, ordinal, interval or
ratio-level variables.
Correlation:
• A correlation is a linear association between variables: a relationship that can be represented by a straight line.
It can be measured by Pearson’s correlation coefficient (Pearson’s r). Pearson’s r is appropriate for interval and ratio-level
A correlation is…
a linear association between variables: a relationship that can be represented by a straight line.
What can a correlation be measure by? and what level of data is it appropriate for?
It can be measured by Pearson’s correlation coefficient (Pearson’s r). Pearson’s r is appropriate for interval and ratio-level
An association is…
and what level of data is it appropriate for?
When two variables are related to one another, in the sense that they vary together
nominal ordinal interval or ratio levels
waht is pearson’s R?
what does it range between?
what numbers denote perfect postive, perfect negative, and no correlation?
Is a measure of a linear relationship
(correlation) between two variables
• Ranges between -1 and 1 • Tells us how well the data fit a straight line r = 1 → a perfect positive correlation r = –1 → a perfect negative correlation r = 0 → no correlation
The correlation coefficient (r) is a ds (like the mean or the standard deviation).
so what do we need to be careful of?
what is it subject to?
what do we need?
descriptive statistic
Therefore, we need to be careful when drawing conclusions from a correlation coefficient computed from sample data:
the correlation coefficient is subject to random sampling error.
We need a significance test.
what would the null hypothesis of a correlation analysis be?
two variables are linearly independent in the population.
we will be testing is that the correlation
is zero in the population
Correlations: Statistical significance and strength
It is important to distinguish the SS of a correlation from the S of a correlation.
statistical significance
strength
Statistical significance means…..
This says X about the strength of the correlation.
“we have evidence against the
null hypothesis that the correlation is zero in the population”.
nothing
(A correlation may be non-zero, but small)
The following values are often used to evaluate the strength (the effect size) of the correlation coefficient:
Small
Medium
Large
Small .10
Medium .30
Large .50
Confidence Interval for a Correlation Coefficient
It is possible to calculate confidence intervals for a correlation coefficient.
For a 95% confidence interval for the correlation between Psych Distress at 16 and Psych Distress at 34 is (0.080 to 0.355). What are we?
For a given pe, the confidence interval of a Pearson correlation will be X, the X the sample size.
Note: SPSS does not have an automatic function to calculate confidence intervals for Pearson correlations. There are online calculators that can work out confidence intervals given the PE (here: r = 0.222) and the SS (here: n = 184)
We are 95% confident that the interval between 0.080 and 0.355 contains the true correlation coefficient.
point estimate
narrower
larger
Point estimate
Sample size
What are the assumptions of Pearson’s r?
when do we not need to make assumptions? why?
when do we need to make assumptions?
CI ST
what do we assume? BND
we do not make any assumptions about the distribution of the two variables, X and Y, whose correlation we measure.
Don’t need to be normally distributed for Pearson’s r to be a meaningful measure of correlation as it only measures linear relationship between two variables
confidence interval for, or perform a significance test
We then assume that the two variables follow a
bivariate normal distribution. If X and Y are each normally distributed, then their joint distribution will be bivariate normal.
Alternative to Pearson’s r
when would you use?
want to carry out ST
LoD
NND
If X and Y are ordinal measures (rather than interval or ratio), or if either X or Y is not normally distributed, but we want to carry out a significance test
what are spurious correlations?
When a correlation is found between things that have no causal relationship with each other
what are the npe of Pearson’s r
what type of data do they work on?
what do the coeffients vary between?
- Spearman’s correlation coefficient (“Spearman’s rho”)
- Kendall’s tau
Both Spearman’s coefficient and Kendall’s Tau work on ranked data (Spearman’s rho is, in fact, the Pearson correlation coefficient carried out on ranked data).
Spearman’s coefficient and Kendall’s tau vary between -1 and +1, and their interpretation is analogous to Pearson’s r.