Correlations and linear regression Flashcards
Define
Correlations
Statistical technique for measuring the extent to which two variables are associated Measures the pattern of responses across variables
Define
Linear regression
a linear model to predict the value of one variable from another
Define
One-tailed
a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both
Define
Two-tailed
a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values
Define
Variance
tells us how much scores deviate from the mean
Define
Covariance
similar to the variance, but tells us how much on two variables differ from their means
Define
Correlation coefficient
The standardised version of covariance
Define
Pearson correlation coefficient
a parametric statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1.
Define
Correlation matrix
a table showing correlation coefficients between variables
Define
Post hoc
statistical analyses that were specified after the data were seen
Define
Spearman correlation
a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function
Define
Monotonic
relationships that are consistently one-directional
Define
Coefficient of determination (r^2)
the proportion of the variance in the dependent variable that is predictable from the independent variable
Define
Shared variance
the extent to which two variables vary together
Define
Partial correlation
Measures the relationship between two variables, controlling for the effect that a third variable has on them both
Define
Semi-partial (part) correlation
Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others
Define
Zero order correlation
the correlation between two variables when you do not control for anything
Define
Directionality problem
it is not possible to determine which variable is the cause, and which is the effect
Define
Residual
The difference between the observed value of the dependent variable (y) and the predicted value (ŷ)
Definition
Statistical technique for measuring the extent to which two variables are associated Measures the pattern of responses across variables
Correlations
Definition
a linear model to predict the value of one variable from another
Linear regression
Definition
a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both
One-tailed
Definition
a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values
Two-tailed
Definition
tells us how much scores deviate from the mean
Variance
Definition
similar to the variance, but tells us how much on two variables differ from their means
Covariance
Definition
The standardised version of covariance
Correlation coefficient
Definition
a parametric statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1.
Pearson correlation coefficient
Definition
a table showing correlation coefficients between variables
Correlation matrix
Definition
statistical analyses that were specified after the data were seen
Post hoc
Definition
a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function
Spearman correlation
Definition
relationships that are consistently one-directional
Monotonic
Definition
the proportion of the variance in the dependent variable that is predictable from the independent variable
Coefficient of determination (r^2)
Definition
the extent to which two variables vary together
Shared variance
Definition
Measures the relationship between two variables, controlling for the effect that a third variable has on them both
Partial correlation
Definition
Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others
Semi-partial (part) correlation
Definition
the correlation between two variables when you do not control for anything
Zero order correlation
Definition
it is not possible to determine which variable is the cause, and which is the effect
Directionality problem
Definition
The difference between the observed value of the dependent variable (y) and the predicted value (ŷ)
Residual
____________exists when changes in one variable tend to be accompanied by persistent and predictable changes in the other variable
Association/Relationship exists when changes in one variable tend to be accompanied by persistent and predictable changes in the other variable
What is the range of correlation values?
-1 to +1
What does a correlation of 0 indicate?
No association (e.g. null hypothesis)
What does the significance of a correlation depend on?
Sample size (n)
Alpha value (one vs two-tailed)
Size of correlation (ignoring direction), must be ___ critical value given for that df
Size of correlation (ignoring direction), must be ≥ critical value given for that df
The _______tells us how much scores deviate from the mean
The variance tells us how much scores deviate from the mean
The ___________of the two variables is similar to the variance, but tells us how much on two variables differ from their means
The covariance of the two variables is similar to the variance, but tells us how much on two variables differ from their means
What is the variance equation?

What is the covariance equation?

What are some of the issues with covariance?
It depends upon the units of measurement
e.g., The covariance of two variables measured in miles might be 4.25, but if the same scores are converted to kilometres, the covariance is 11
How do you get around the issue of units in covariance?
Standardise it
(divide the SD of both variables)
The standardised version of covariance is known as the ______________
The standardised version of covariance is known as the correlation coefficient
What happens to the range when you standardise the covariance?
It goes from being unrestricted to between -1 and +1
What is the parametric correlation coefficient?
Pearson
What is the formula for the Pearson/Spearman correlation coefficient?

Why is it not necessarily a good idea to run a correlation matrix for all your variables?
The more tests you run, the more likely a false positive will occur
What are the assumptions of a Pearson correlation?
Both variables measured on an interval or ratio scale
Both variables should be normally distributed (For statistical inference)
Relationship between the variables must be linear
What happens if normality is violated when you want to run a Pearson correlation?
If N > 30, can use central limit theorem to justify proceeding, despite violation
Otherwise, use a Spearman correlation
What happens if linearity is violated when you want to run a Pearson correlation?
If relationship is monotonic, use a Spearman correlation
Otherwise, try transforming the data to achieve linearity
What does a Spearman correlation measure?
It measures the association between two ordinal variables; that is, X and Y both consist of ranks
It measures the consistency of direction of the association between two interval/ratio variables
Why is a Spearman correlation less powerful than a Pearsons?
The continuous data must be converted into ranks before conducting the correlation
True or False:
You could conduct a Spearman correlation on this data

False
It is nonmonotonic
What statistic do you use to measure the relationship strength of a correlation?
Coefficient of determination (r2)
If r = .411, what is the coefficent of determination?
r2 = .169
Therefore, 16.9% shared variability
a ____________ is the correlation between two variables when you control (i.e., hold constant) the effects of a third variable on both of the other variables
a partial correlation is the correlation between two variables when you control (i.e., hold constant) the effects of a third variable on both of the other variables
What does this equation show?

Shows the association between and Y and X1 after removing the overlap of X2 with X1, and with Y
Why is semi-partial correlation used in multiple regression?
Used in multiple regression because if you square the semi-partial correlation, it tells you the variability in the outcome uniquely accounted for by one specific predictor variable
i.e., controls for the relationships between predictors, so the outcome variable’s relationship with other predictors is still taken into account
What does this equation show?

How much of the total variability in Y is uniquely explained by X1
What is the difference between partial correlation and semi-partial correlation?
Partial correlation:
Measures the relationship between two variables, controlling for the effect that a third variable has on them both
Semi-partial correlation:
Measures the relationship between two variables, controlling for the effect that a third variable has on only one of the others
What is a 1st and 2nd order correlation?
1st order correlation = partial correlation that controls for 1st variable
2nd order correlation = partial correlation that controls for 2 variables
If I find that the Pearson correlation between exam performance and revision time is .384 for males and .442 for females, can I test whether the correlation is ‘significantly’ stronger for females than males?
The best way to test whether the association between two variables differs by group is to use multiple regression with interactions, which we will discuss later
Can partial correlations be performed on non-parametric data? (i.e., is there a way to do Spearman partial correlations?)
Yes! Use Spearman’s partial rank order correlation
What are the two options for dealing with missing data?
Exclude cases pairwise
Exclude cases listwise
What happens if we exclude cases pairwise?
For EACH correlation, exclude participants who do not have a score for both variables
What happens if we exclude cases listwise?
Across ALL correlations, exclude participants who do not have a score for every variable
What is the linear regression equation?
Yi = b0 + b1X1 + εi
What do the dots and the lines represent?

Dots = actual scores
lines = difference between actual scores and predicted scores (residuals)
What is the regression line equation of this data?

Yi = b0 + b1X1 + εi
Yi = 43.9 + 0.65X + εi
What does this value tell us?

Here .411 means that a 1 SD increase in revision time is expected to relate to a .411 SD increase in exam performance