Correlation and Regression Flashcards
Is there a relationship between the amount of time spent revising for an exam and exam performance?
Correlation
After controlling for exam anxiety, is there an association between revision time and exam performance?
Correlation
When seeing the terms relationships/associations/controlling, what general category of analysis would you choose?
Correlation and regression
When would you use a correlation over a regression considering they measure similar things?
Correlation would be used to assess a quick summary of the direction and strength of the relationship between two or more numeric variables
When you’re looking to PREDICT or optimise/explain a number response between the variables (how X influences Y) then you are looking at regression.
Regression = how one variable affects another
Correlation = the degree of relationship between two variables so strength and direction.
What does a -1 correlation tell us regarding the association between variables?
There is a perfect negative correlation. Therefore there is an association.
What does a +1 correlation tell us regarding the association between variables?
There is a perfect positive correlation.
What does a positive correlation mean?
As the FIRST (X) variable increases, the SECOND (Y) variable also increases.
What does a negative correlation mean?
As one variable increases, the second one decreases.
What do correlations measure?
The pattern of respones across variables
As you get cloer to a negative or positive correlation (true 1 or -1) does the association get weaker or stronger?
Stronger
What does an association of 0.0 indicate?
The null hypothesis aka no association.
How many tails can the alpha be?
Either one tailed or two tailed
How is the sample size and alpha/error rate important regarding whether a correlation is statistically significant or not?
Because
In a one tailed correlation what is the alpha level?
0.5. Testing an effect in one direction only
In a two tailed correlation, the alpha level is 0.25. Why?
Because it is testing the correlation in EITHER direction.
Which is more powerful - one tailed or two tailed correlation?
One tailed. More sure about the hypothesis - empirical evidence
Why would you use a two tailed correlation?
When uncertain about your hypothesis.
Sample size and alpha vale need to be considered when looking at correlation significance. What is true regarding assessing if a pearson R is significant, in terms of NUMBER OF DEGREES OF FREEDOM?
The size of the correlation (regardless of direction) must be MORE THAN the critical value given for that degree of freedom.
Would you expect to see a higher Pearson r value for a big n or a small n?
Higher r value for a small n because if few people in a study, a moderate correlation more likely to be due to chance as not many people compared to a desgn with a large sample
What does variance tell us?
A. How much scores deviate from the mean of the distribution
B. Variance is the average squared distance from the mean
C. Both
C
It is essentially the measure of how far away the data points are from the mean.
Why do we have the square the distance from the average of the distribution when it comes to variance?
Because the data points will be BOTH above and below the mean. If we average those points WITHOUT squaring them, they will cancel each other out (positive distance and negative distance from the mean = same thing, back to original score)
So why is the standard deviation (SD) the square root of the variance?
Because you have to square the data points before averaging them. Then SD is just square root after you have squared it first
How is the covariance of the two variables similar to the variance?
It tells us how much two variables differ from their means.
So instead of variance telling us how far data points are from mean for one variable, covariance shows how much TWO variables together differ from their means.
When dealing with covariance, why is it important to standardise it?
Because the units of measurement can lead to different outcomes with covariance equations.
How do we standardise the variables to deal with covariance problems such as measrements leading to different covariance outcomes?
We divide by the standard deviations of both variables. This is the correlation coefficient which is relatively unaffected by units of measurement
The standardised version of covariance is known as…?
The correlation coefficient (Pearson)
Covariance is unstandardised and correlation is standardised. Why?
Because the problem with covariance is units of measurement, because with raw scores covariance might be different. So you standardise to get around this, and standardising a variable means it is now a correlation .
ADvantageous as Doing this to continuous variables fixes many things. Putting things on the SAME scale when you standardise which makes it easier to compare the two or more variables. You might also hear it called SCALING. It makes your variances equal. Then you’re not looking at COVARIANCE anymore as you have made the variances equal
What is covariance of standardised variables essnetially?
A correlation. When you calculate this you get a correlation coefficient.
How do we standardise a variable?
By subtracting the mean and dividing by the SD
If we already have the co variance of X and Y, what do we need to do to standardise them?
Divide by the SD of X and Y
What does dividing the covariance by the SD do in terms of the RANGE of the correlation coefficient?
We force it to be between -1 and +1, about how staright line fits the data. So correlation coefficients can’t be less than or more than +1. BUT, covariance can be anything!
True or valse: correlations AND covariance must be between -1 and +1
False - covariance can be anything
Why is keeping correlations between -1 and +1 advtangeous?
For comparisons
What does a pearson correlation do regarding what it measures with variables?
It measures the DIRECTIOn and DEGREE of linear relationship between two interval/ratio variables. The + or - denoted the direction of the relationship, so whether positive or negative.
Can both covariance and correlation tell us about the direction of any linear relationship?
Yes
What is the diff between correlation and covariance regarding what it shows with linear relationships?
Correlation shows not only direction of linear relationship but STRENGTH. Covariance can’t.
What type of data is required for covariance/correlation?
Continuous (interval or ratio)
What is more important when looking at a correlation coefficient: whether the data points fit the line better or if positive/negative association?
If data fits the line better. Because then this tells us there is a strong relatinship, so a change in one variable is associated with a chane in another variable (not about what caused it) .
Why would we use a correlation matrix?
Because if we have a dataset with many variables you would have coefficients between each combination of those variables.
Correlation of each pairwise combination of variables in whole dataset.
What is a problem for running correlations in exploratory analyses?
The more tests you run you increase chance of getting false positive. Also, people can run these tests not before hypothesis testing and try to come up with a justification
Why would a person running a correlation without thinking of research questions need to be careful?
Because justifiying the fact without considering the research is not a correct way of scientific research.
True or false: should variables be normally distributed in a pearson correlation?
Yes. For statistical inference
If, for a pearson correlation, assumption of normality appears violated
(the sample size is less than 30, variables do not appear to be normally distributed)
what other test can be used?
Spearman correlation
How can normality be violated for a pearson correlation?
the sample size is less than 30, variables do not appear to be normally distributed
If linearity is violated for a pearson correlation, what would this look like?
Monotonic relationship between variables. Therefore use spearman or try transforming the data
What type of test is a pearson correlation?
Parametric. Parametric tests always make assumptions about the distribution of the data being normal. Would always go for a parametric test as have more power.
What is the next step of a correlation?
Regression!
What is a non parametric correlation?
Spearman
What does a spearman correlation measure?
The association between TWO ORDINAL VARIABLES. X and Y both consist of ranks. Specifically, the degree of monotonic relationship between two variables, because assumption of linearity does not need o be met
The consistency of the DIRECTION of the association between two intercal/ratio variables. Therefore, interval.ratio data must be converted to ranks before conducting spearkman correlation.
True or false: Spearman correlation coefficient uses sae formula as pearson, only calculations are performed on rank data instead
TRUE
The assumption of a spearman correlation is that the data is…
ORDINAL
Because the data is ordinal in spearman correlation, how does this suggest test is less powerful than pearson?
Converting data to ordinal means you lose variability and richness of the data.
What does it mean for a relationship to be monotonic?
The variables have a monotonic relationship or association which means relationships are consistently one directional, but not neccessarily linear.
Because pearson correlation assume linearity, spearman assumes not that data points perfectly fit a line but that data is either consistently increasing or decreasing
Alongside data being ordinal in spearman, what is the assumption regarding the data?
That the data is monotonic.
What is a non monotonic relationship between two variables?
As a variable increases, the other increases, but sometimes as variable decreases, the other decreases too (up and down)
Part two: shared variance/partial correlations?
What does shared variance mean and why would we want to calculate partial correlations?
How can we assess the relationship strength in correlation?
Using the COEFFICIENT OF DETERMINATION (r2)
Which is an effect size
What is the R in spearman correlation?
Spearman’s rho
How do we calculate the coefficient of determination (an effect size) which assess the relationship strength?
By simply squaring r (the correlation coefficient)
the coefficient of determination tell us:
A. the proportion of variabiity in Y that is accounted for by variability in X
B. How accurately one variable predicts another
C. A and B
C
In a spearman correlation, the coefficient of determination tells us:
A. the proportion of variance in the RANKS that the two variables share
B. How accurately one variable predicts another
C. A and B
C
what is this? r2
the coefficient of determination (calculated by squaring the correlation coeffiicent). effect size
It tells us how strongly the two variables are associated
How would we find the proportion of overlapping variance?
BY r2 (coefficient of determination).
To find the proportion of overlapping variance amongst variables, and r = .5, how would we work this out?
(.5) 2 (squared)
You’re looking at a table, specifically at r squared (2). You have the variables CHEESE and BREAD and can see that the coefficient of determination for these both is .6. What is this saying?
That 6% of the variability in cheese can be explained with variabiity in bread.
If a correlation is measuring the degree of overlap between variables, what is shared variance really showing?
How much the VARIANCES of each variable overlap. That is what coefficient of determination is showing us.
If there is 6% in variance in the outcome variable accounted for by the predictor variable - does this mean one variable is causing another?
NO. it is about being accounted for, not caused by.
If you have three variables (X1, X2, Y) and C shows the variance shared by ALL three variables, to see the UNIQUE associations between, say, x1 and Y, what needs to be removed?
C. Partial correlations investigated
Because need to control for that third variable when looking at association between more than two variables
Why would we do partial correlations?
Because it measures the assocations between two variables, controlling for the effect that a third variables has on them both
What is a partial correlation
The correlation between two variables when you hold constant the effects of a third variable on both of the other variables
If we wanted to examine the unique effect of revision time on exam performance while controlling for the effects of another variable, like exam anxiety, on both revision time and exam performance, what type of correlation would we be looking at?
Partial correlation because it allows us to control for exam anxiety
*** What does a SEMI partial correlation do compared to a partial correlation?
Where a partial correlation controls for the effect of a third variable on the correlations between two variables, a semi partial correlation controls for the effect a third variable has on ONE of the others.
When is a semi-partial correlation used?
It is used in multiple regression because if you square the semi-partial correlation, it tells you the variability in the outcome uniquely accounted for by one specific predictor variable
Which does the below - a partial correlation or a semi partial correlation?
Controls for the relationships between predictors so the outcome variables relationship with other predictors is still taken into account
e.g tells unique effect of revision time on exam performance whilst controlling for effect of exam anxiety on revision time
Semi-partial correlation
What happens when we square a semi-partial correlation? especially pertaining to multiple regression
It tells us how much TOTAL variability in Y uniquely accounted for by ONE SPECIFIC PREDICTOR VARIABLE. It controls for the associations between predictor variables.
So it takes into account X1 AND X2 associations in semi partial correlations whilst also giving unique effect for, say, X1 on Y.
What are zero order correlations and which analyses would we see them in
Zero order is the correlation between two variables when you DO NOT Control for any other variables.
So we do this in pearson and spearman correlations - just looking at relationship between two variables without controlling for anything.
Do partial correlations control for the effects of one or more variables?
Yes.
1st order correlation: partial correlation that controls for first variable
2nd order correlation: partial correlation that controls for TWO variables
What is the directionality problem?
The idea it is not possible to determine which variable is the cause and which is the effect. Research supports bidirectional effects across variables.
Therefore correlations with cross sectional data are limited
What is the third variable problem?
A relationship established between two variables DOES NOT mean there is a DIRECT relationship as a third variable may be responsible for the relationship
If we wanted to test the correlation strength between exam performance and revision time between males and females, what test would we use?
A multiple regression with interactions because we are looking at whether an association between two variables differs by group.
True or false: Partial correlations can be performed on non-parametric data
True. Spearman’s partial rank order correlation can be used :)
When dealing with missing values in correlations, what does exclude cases pairwise mean?
When for each correlation, we exclude participants who do not have a score for both variables. If there is more than 1 correlation reported, sample size may vary across different correlations
When dealing with missing values in correlations, what does exclude cases listwise mean?
Across ALL correlations, exclude participants who do not have a score for every variable. The sample size WILL be the same for all reported correlations.
It is not recommended to do listwise, so just removing partiicpant from entire dataset.
What is a regression?
A way of modelling the association or relationship between the variables. We’re looking at modelling the data we have in a linear fashion.
It is a model used to predict the value of one variable from another. It is a linear one
Describing the relationship using the equation of a straight line
Which axis does the outcome or DV go on and which axis predictor?
Predictor: X
Outcome/DV: Y
What is the equation for describing a straight line or the line of best fit?
Y(i) = b0 + b1X1 + ei
b0 = y intercept. Value of Y when x = 0.
B1 - regression coefficient for the predictor - strength and direction of relationship
e = error term. Difference between ACTUAL (data point) and PREDICTED value of Y for ith person.
What do we need to know about a line to predict outcome variable Y?
The intercept, the slope for a predictor variable, and what the error is
Why do we want small error terms?
Because smaller error terms mean the difference between actual scores and predicted scores are less, which mean the model is more accurately predicting scores. Less difference between ACTUAL scores and PREDICTED scores.
Why do we square residuals?
Because some are above and below the line and if didnt square, would cancel each other out. After squaring, sum to get sum of squares residual.
What does the method of least squares do?
It uses calculus to determine the regression line that minimises the sum of square resisuals aka reduces least square residuals
True or false: A regression line does not pass through the mean of the predictor and the outcome. Hence when X is equal to its mean, Y is predicted to be equal to the mean of Y
False. A regression line DOES always pass through the mean of the predictor
What is the equation of the line of best fit?
Y(i) = b0 + b1X1 + ei
In the coefficients table, we have unstandardised score (B), standard error, standardised, etc.
What are we looking at to find out what the modelling is predicting?
The regression coefficient unstandardised (B). No the intercept but the value next to the predictor variable.
The unstandardised value (B) is the slope. If positive, positive slope. And it is saying as X increases by 1 unit change, then the Y would increase by (insert unstandardised variable which is the regression coefficient)
Rather than doing the linear equation manually, using statistical software we are looking at the coefficient table for what exactly?
- The direction
- Magnitutde
- Whether a variable is a statistically significant predictor
we have unstandardised coefficient (B) and standardised coefficient (beta, little b). What is the difference between the two?
B tells us similar information to b.
However, B is communicating that for ONE UNIT CHANGE in predictor variable X, this will be the change in Y, b is saying:
This is the expected STANDARD DEVIATION change in Y for a 1 SD change in X.
So standardised beta is 1 SD change, and unstandardised B is a 1 unit change.
If we wanted to look at the DIRECTION of relationship between variables, would we look at B or beta?
Look at B. Unstandardised. That tells us about the slope.
Unstandardised is looking at unit changes, standardised is looking at SD changes
If we wanted to look at the MAGNITUDE/STRENGTH of relationship between variables, would we look at B or beta?
Beta. Standardised tells us about effect sizes.
Unstandardised is looking at unit changes, standardised is looking at SD changes
When looking at the coefficients table, what does p
Significance. Signficant predictor.
Specifically, it indicates that our model explains an association between variables in a statistically significant way, meaning the two variables are related more than by chance.