Week 4 - Correlations and linear regression Flashcards
If a research question has the words “relationship between X and Y” or “after controlling for Z, is there an association between X and Y?”
What kind of statistical analysis would be appropriate for these questions?
Correlations and regression
What is a correlation?
A statistical technique for measuring the extent to which two variables are associated/ related.
Measures the pattern of responses across variables.
Assumes linear association between variables
Changes in one leads to predictable changes in another variable
Usually a bivariate association
How is an association/ relationship determined?
When changes in one variable can show persistent and predictable changes in other variables
Does correlation always mean causation?
No, because a third variable might be causing the observed associations
What is the range of values for a correlation
-1 (perfect negative) to +1 (perfect positive)
0 = no association therefore represents the null hypothesis
How s the significance of a correlation determined?
- sample size (n)
- alpha value (one (0.05) vs two-tailed (0.05)
- –> e.g. predicting one direction (positive or negative) or two direction
Which type of alpha has greater statistical power? one-tailed or two tailed?
A one-tailed test because it only tests in one direction (very confident hypothesis is in that direction - back up with theory)
What does variance measure?
How much the scores deviate from the mean of the distribution (one-variable)
variance = average squared distance from the mean
What does covariance measure?
How much TWO variables differ from the mean
instead of sum of squares, sum of cross products are observed
What are the problems with covariance?
How are they fixed?
UNIT OF MEASUREMENT - e.g. covariance of two variables might be measured in miles = 4.25 but then if converted to km the covariance is 11
–> standardise it (divide by the SD of both variables)
The standardised version of a covariance is known as the ____
correlation coefficient
- unaffected by units of measurement
- makes the variances equal
covariance = standardised/unstandardised
whereas Pearson correlation = standardised/unstandardised
covariance = unstandardised
Pearson correlation= standardised
What does Pearson Correlation tell us?
Direction + strength of linear relationship between two interval/ratio variables (continuous data)
What symbol denotes Pearson Correlation
r
r = strength and direction
What does the size/magnitude of ‘r’ denote?
degree to which points fit on a straight line. Closer to 1 = more straight line indicating a linear relationship
+1 positive relationship
-1 negative relationship
0 = no relationship/ two variables are independent of one another
What is a correlation matrix?
Represents each correlation between pairwise combination of variables.
Can be used for descriptive statistics/ exploratory analysis
In a correlation matrix, each correlation is a separate test. What is the issue with multiple testing?
How can we fix this?
more tests (without seperate justifiable hypotheses) Increases the risk of a false positive
–> post hoc analysis = reporting associated found after data collection
What are the assumptions of a Pearson Correlation?
- Parametric test therefore is assumes variables are normally distributed
- linear association
- variables measured on an interval or ratio scale
How to deal with violation to the assumption of normality in a Pearson Correlation?
- if N >30, use CLT to justify preceding despite violation
- Spearman correlation
How to deal with violation to the assumption of linearity in a Pearson Correlation?
- If relationship is monotonic, use Spearman correlation
- Otherwise, transformation to achieve linearity
What are the two situations where you can use Spearman Correlation (r s or rho)
- to find the association between two ordinal variables (X & Y consist of ranks)
- to measure the consistency of direction of the association between two interval/ratio variables
- -> variables converted to ranks first before Spearman is used
- Measures the degree of monotonic relationship between variables
Do the Spearman Correlation Coefficient and Pearson Correlation Coefficient use the same formula?
Does this make the analysos more or less powerful?
yes, only the calculations are performed on ranked data instead in Spearman
Less powerful because data is lost during it’s conversion into ordinal data
What is a monotonic relatonship?
Assumption that even tho the data doesn’t fit on straight line, data points are generally going in the same direction.
As Pearson correlation assumes linearity, use can use Spearman if data is non-linear but monotonic (increasing in the same direction)
What do you use to find:
The proportion of variability in Y variable that can be attributed to variability in X
Coefficient of determination (r2/ r squared)
Shows how accurately one variable predicts the other
For Spearman’s Correlation, what does the coefficient of determination tell us?
The proportion of variance in the RANKS that the two variables
r = .411 r2 = 16.9
“16.9% of the variability in exam performance can be explained (overlaps) with variability in revision time.
What is the X variable/ Predictor?
Y variable/ Outcome ?
What type of variance does this represent?
X = revision time
Y = exam performance
Shared variance
What is a partial correlation?
Measures association between two variables, controlling for the effect a third variable has on both
What is a semi-partial/ part correlation?
Measures the relationship between two variables, controlling for the effect that a third variable has on ONE of the other variables
What is a zero order correlation?
correlation between two variables when you do not control for anything
What is a first order correlation?
partial correlation that controls for 1st variable
What is a 2nd order correlation?
partial correlation that controls for two variables
What is the directionality problem for correlations?
correlation =/= causation
third variable
it is not possible to determine which variable is the cause and which is the effect
Can you compare correlation coefficients?
The best way is to use multiple regression to test whether the association between two variables differs by group
Can partial correlations be performed on non-parametric data?
Yes, you can do Spearman partial correlations.
Spearman’s partial rank order correlation
What is linear regression?
A model used to predict the value of one variable from another; describe the relationship using the equation of a straight line
Straight line equation
Yi = b0 + b1X1 + Ei
Yi = the ith person’s score on the outcome variable
B0 = Y-intercept (value of Y when X=0) Point at which the regression line crosses the y-axis
B1 = regression coefficient for the predictor
- gradient (slope) of the regression line
- direction/ strength of relationship
Ei = the difference bewteen the actual and predicted value of Y for the ith person