8: CORRELATION AND PARTIAL CORRELATION Flashcards
bivariate linear correlation
- examines the relationship between 2 variables
- relationships vary in: form, direction, magnitude / strength
+1 / -1 represent a perfect correlation (positive/negative)
0 represents no correlation
correlation: hypothesis testing
- Linear correlation involves measuring the relationship between two variables measured in a sample
- But crucially, we’re interested in whether there’s a relationship between the equivalent population variables
- We use sample statistics to estimate the population parameters
- Always start by assuming the null hypothesis is true: there is no relationship between the population variables
- Once we’ve determined the relationship in our sample, inferential analyses allow us to determine the probability of measuring a relationship of that magnitude when the null hypothesis is true
correlation: p-values
what is the chance of measuring a relationship of that magnitude when the null is true?
- answer this in terms of prob
- p-value: the prob of measuring a relationship of that magnitude when the null is true
- we set a threshold level of probability (alpha) where we will be willing to reject null
- if prob is less than our threshold we are prepared to reject null
bivariate linear correlation: parametric assumptions
- both variables should be continuous (level of measurement), if one (or both) is ordinal use non-para alternative
- related pairs: each participant should have a pair of values
- absence of outliers
- linearity - points in the scatterplot should be best explained with a straight line
- additional point to note - sensitive to range restrictions (floor and ceiling effects)
if data violates - spearman’s rho (or kendall’s tau if fewer than 20 cases)
floor and ceiling
Ceiling and floor effects occur when a considerable proportion of subjects score the best/maximum or worst/minimum score, rendering the measure unable to discriminate between subjects at either extreme of the scale
pearson’s correlation coefficient
- investigates the relationship between 2 quanititve, continous variables
- the resulting correlation coefficitent (r) is a measure of the strenght of association between the 2 variables
- r reflects how well a straight line fits the data points (i.e. the strength of the correlation)
- if points cluster around line, r is further from 0
- if points are scattered a distance from the line, r will be closer to 0
covariance
- provides a measure of the variance shared between our x and y variables
- the correlation corefficient (r) is a ratio of covaraicne (shared variance0 to separate variances
- if covariance is large relative to the separate variances, r will be further from 0
- if the covariacne is small …, r will be close to 0
degrees of freedom for r
N-2
report degrees of freedom when reporting r
sampling distribution of correlation coefficients
H0 states there is no relationship between population variables
so under the null the sampling dist of correlation coefficients will have a mean of 0
r distribution
- the r distribution has a mean of 0
- the extent to which an individual samples correlation coefficient (r) deviates from 0 can be expressed in standard error units
- using the r-distribution, and what we know about the proportion of scores falling under each area of a sampling list. we can determine the probability of obtaining an r-value of a given magnitude when the null is true ( p-value)
confidence intervals around r
really interested in r for the pop.
“we have 95% confidence that the population correlation coefficient falls between ____ and ____”
not done by SPSS
report to 3 d.p
shared variance
r^2: expresses the proportion of variance that is shared (between variables)
effect size: r
r is another useful measure of effect size, it can be squared to give a measure of shared variance, expressed as a proportion of separate variances (r2)
* It tells us how much of the variance in y can be ‘explained by’ x
partial correlation
- allows us to examine the relationship between 2 variables, while removing the influence of a 3rd variable
- we want to control for the effect of this variable: ‘partial out’, “hold ‘variable’ constant”
can use this to find if this 3rd variable actually has an influence