Correlation and Partial Correlation Flashcards
Bivariate linear correlation
- examines the relationship between two variables
what can bivariate relationships vary in:
- form (linear, curvilinear)
- direction (positive / negative)
- magnitude/strength
magnitude/strength in bivariate relationships
- r = -1 : perfect negative relationship
- r = +1 : perfect positive relationship
- r = 0 : no relationship
positive or negative correlation
correlation does not mean causation
strength of correlation : strong negative/positive
+/- 0.9, 0.8, 0.7
strength of correlation: moderate negative/positive
+/- 0.6, 0.5, 0.4
strength of correlation: weak negative/positive
+/- 0.3, 0.2, 0.1
correlation hypothesis testing
- linear correlation involves measuring the relationship between two variables measured in a sample
- we are more interested in if there is a relationship in equivalent population variables
- use sample statistics to estimate population parameters
- H0: there is no relationship between the population variables
p-value in correlation
what is the chance of measuring a relationship of that magnitude when the null hypothesis is true?
- reject null if p < .05
parametric assumptions
- both variables should be CONTINUOUS (if not use non-parametric alternative)
- related PAIRS: each participant should have a pair of values
- absence of outliers
- linearity: point sin scatterplot should be best explained with a STRAIGHT line
- sensitive to range restrictions: floor and ceiling effects
non-parametric alternative
- if assumptions violated
- Spearman’s rho (or Kendall’s Tau if fewer than 20 cases)
(or fewer than 7 points on a likert scale you use one of these)
floor effect
cluster of scores at bottom of scale
- form of range restriction
ceiling effect
clustering of scores at top of scale
- form of range restriction
PPMCC
pearson product-moment correlation coefficient
what does Pearson’s correlation coefficient investigate
the relationship between two quantitative, continuous variables
what does Pearson’s produce
a correlation coefficient ‘r’ which is a measure of the strength of association between the two variables
Covariance
- for each data point, calculate the difference from the mean of x, and the difference from the mean of y
- multiply the differences
- sum the multiplied differences
- divide by N -1
what does covariance do
- provides a measure of the variance shared between x and y variables
correlation coefficient and covariance
- ‘r’ is a ratio of covariance (shared variance) to separate variances
- we can obtain a measure of separate variances by multiplying the standard deviation for x and y
- DON’T NEED TO DO THIS BY HAND
correlation coefficient strength
- ‘r’ is a ratio
- if covariance is large relative to separate variances, r will be further from 0
- if covariance is small relative to separate variances, r will be closer to 0
what can r represent
it can tell us how well a straight line fits the data point i.e. the strength of correlation
- if data points cluster around the line, r is further from 0
- if data points scattered around line, r is closer to 0
SPSS output for correlation
- r tells you strength
- p tells you if correlation is significant
- N helps you calculate d.f.
degrees of freedom for r
N - 2
- report when reporting r
e.g. r(23) = .522, p = .007
sampling error
- r value obtained from another sample from the same population would likely be different
- reflects sampling error
Sampling Distribution of Correlation Coefficients
- if we obtained r for all possible samples drawn from the population of interest … the mean resulting distribution would be equivalent to the true population correlation coefficient
- H0: no relationship between population variables (i.e. r = 0)
- so under the null the sampling distribution of correlation coefficient will have a mean of 0
(normal distribution)
r distribution
- has a mean of 0
- extent to which an individual sampled r deviates from 0 can be expressed in standard error units
- distribution depends on r value-of underlying population and number of samples
confidence interval around r
- r is a point estimate of underlying population r-value and it is subject to sampling error
- ’ we have 95% confidence that the population correlation co-efficient falls between __ and __’
SPSS output for CIs
image shows for CIs around r
(for CIs for independent variables look at main descriptive statistics table at the top of SPSS output)
shared variance
- r^2
- expressed the proportion of the separate variances that is shared
- e.g. r = .8, r^2 = .64, variables share 64% variance
… meaning that 18% of variance of each variable is not shared
note on r and shared variance - relative strength
a weaker r e.g. .4 vs .8 means
.8 is 4x as strong as .8
we use r^2 to talk about relative strength e.g. how strong is .4 compared to .8
effect size
r is a measure of effect size, and once squared to give shared variance it can be expressed as a proportion of separate variances, telling us how much of variance in y can be ‘explained’ by x (similar to partial eta squared where how much of variance in DV can be explained by manipulation of IV represented by partial eta sqaured)
partial correlation
allows us to examine the relationship between two variables, while removing the influence of a third variable
e.g. we want to control for a confounding variable: when looking at the relationship between IQ and grade, we want to remove.control the influence of test motivation
how can we control for a third variable (confounding)
- recruit p’s who have same levels of variable e.g. same level of motivation
- control variable through statistical means e.g. ‘partial out’ variable or ‘hold variable constant’
‘partialling’ out a variable and SPSS output
- if removing third variable, correlation between x and y is reduced and may no longer be significant
- if this case you say
‘relationship between X and Y may well have been explained by Z (third variable) on both X and Y - if relationship had remained significant it would suggest the relationship was partially explained by third variable
- if correlation had not decreased, relationship is not explained by third variable
write up: correlation and partial correlation: design
no design
write up correlation and partial correlation: results section: step one: descriptive statistics
DS for each variable
- measure of CT: mean
- measure of spread: SD
- interval estimate: 95% CIs (lower and upper)
- - ‘Descriptive statistics and bi-variate correlation between study variables in Table 1’
write up correlation and partial correlation: results: step 2: inferential analysis
- state test used
- always ‘Pearson’s correlation coefficient revealed a (significance)(direction if significant) relationship between X and Y.
- report: r(df) - _.__, p = .__, 95% CIs [.__, .__] (CIs around mean r) if only 2 variables, if partial correlation report pearson’s in table (PIC) but always include partial in text
- (if needed) partial correlation revealed that the relationship between X and Y was (significance) when controlling for Z, r(df) = .__, p = .__
- df as N-2 instead of N
Discussion for partial correlation
while x increased/decreased with Y, this relationship was eliminated/not affected when Z was controlled. These findings suggest that the relationship measured between X and Y may be explained/ not explained/ partially explained by Z.
discussion for correlation
X and Y were related, with high/low Xs associated with high/low Ys