Correlation and Regression Flashcards

1
Q

Is there a relationship between the amount of time spent revising for an exam and exam performance?

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

After controlling for exam anxiety, is there an association between revision time and exam performance?

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When seeing the terms relationships/associations/controlling, what general category of analysis would you choose?

A

Correlation and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When would you use a correlation over a regression considering they measure similar things?

A

Correlation would be used to assess a quick summary of the direction and strength of the relationship between two or more numeric variables

When you’re looking to PREDICT or optimise/explain a number response between the variables (how X influences Y) then you are looking at regression.

Regression = how one variable affects another

Correlation = the degree of relationship between two variables so strength and direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a -1 correlation tell us regarding the association between variables?

A

There is a perfect negative correlation. Therefore there is an association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a +1 correlation tell us regarding the association between variables?

A

There is a perfect positive correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a positive correlation mean?

A

As the FIRST (X) variable increases, the SECOND (Y) variable also increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a negative correlation mean?

A

As one variable increases, the second one decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do correlations measure?

A

The pattern of respones across variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

As you get cloer to a negative or positive correlation (true 1 or -1) does the association get weaker or stronger?

A

Stronger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does an association of 0.0 indicate?

A

The null hypothesis aka no association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many tails can the alpha be?

A

Either one tailed or two tailed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the sample size and alpha/error rate important regarding whether a correlation is statistically significant or not?

A

Because

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a one tailed correlation what is the alpha level?

A

0.5. Testing an effect in one direction only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a two tailed correlation, the alpha level is 0.25. Why?

A

Because it is testing the correlation in EITHER direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which is more powerful - one tailed or two tailed correlation?

A

One tailed. More sure about the hypothesis - empirical evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why would you use a two tailed correlation?

A

When uncertain about your hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sample size and alpha vale need to be considered when looking at correlation significance. What is true regarding assessing if a pearson R is significant, in terms of NUMBER OF DEGREES OF FREEDOM?

A

The size of the correlation (regardless of direction) must be MORE THAN the critical value given for that degree of freedom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Would you expect to see a higher Pearson r value for a big n or a small n?

A

Higher r value for a small n because if few people in a study, a moderate correlation more likely to be due to chance as not many people compared to a desgn with a large sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does variance tell us?

A. How much scores deviate from the mean of the distribution
B. Variance is the average squared distance from the mean
C. Both

A

C

It is essentially the measure of how far away the data points are from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why do we have the square the distance from the average of the distribution when it comes to variance?

A

Because the data points will be BOTH above and below the mean. If we average those points WITHOUT squaring them, they will cancel each other out (positive distance and negative distance from the mean = same thing, back to original score)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

So why is the standard deviation (SD) the square root of the variance?

A

Because you have to square the data points before averaging them. Then SD is just square root after you have squared it first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is the covariance of the two variables similar to the variance?

A

It tells us how much two variables differ from their means.
So instead of variance telling us how far data points are from mean for one variable, covariance shows how much TWO variables together differ from their means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When dealing with covariance, why is it important to standardise it?

A

Because the units of measurement can lead to different outcomes with covariance equations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do we standardise the variables to deal with covariance problems such as measrements leading to different covariance outcomes?

A

We divide by the standard deviations of both variables. This is the correlation coefficient which is relatively unaffected by units of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The standardised version of covariance is known as…?

A

The correlation coefficient (Pearson)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Covariance is unstandardised and correlation is standardised. Why?

A

Because the problem with covariance is units of measurement, because with raw scores covariance might be different. So you standardise to get around this, and standardising a variable means it is now a correlation .

ADvantageous as Doing this to continuous variables fixes many things. Putting things on the SAME scale when you standardise which makes it easier to compare the two or more variables. You might also hear it called SCALING. It makes your variances equal. Then you’re not looking at COVARIANCE anymore as you have made the variances equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is covariance of standardised variables essnetially?

A

A correlation. When you calculate this you get a correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do we standardise a variable?

A

By subtracting the mean and dividing by the SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

If we already have the co variance of X and Y, what do we need to do to standardise them?

A

Divide by the SD of X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does dividing the covariance by the SD do in terms of the RANGE of the correlation coefficient?

A

We force it to be between -1 and +1, about how staright line fits the data. So correlation coefficients can’t be less than or more than +1. BUT, covariance can be anything!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

True or valse: correlations AND covariance must be between -1 and +1

A

False - covariance can be anything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why is keeping correlations between -1 and +1 advtangeous?

A

For comparisons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does a pearson correlation do regarding what it measures with variables?

A

It measures the DIRECTIOn and DEGREE of linear relationship between two interval/ratio variables. The + or - denoted the direction of the relationship, so whether positive or negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Can both covariance and correlation tell us about the direction of any linear relationship?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the diff between correlation and covariance regarding what it shows with linear relationships?

A

Correlation shows not only direction of linear relationship but STRENGTH. Covariance can’t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What type of data is required for covariance/correlation?

A

Continuous (interval or ratio)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is more important when looking at a correlation coefficient: whether the data points fit the line better or if positive/negative association?

A

If data fits the line better. Because then this tells us there is a strong relatinship, so a change in one variable is associated with a chane in another variable (not about what caused it) .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Why would we use a correlation matrix?

A

Because if we have a dataset with many variables you would have coefficients between each combination of those variables.
Correlation of each pairwise combination of variables in whole dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is a problem for running correlations in exploratory analyses?

A

The more tests you run you increase chance of getting false positive. Also, people can run these tests not before hypothesis testing and try to come up with a justification

41
Q

Why would a person running a correlation without thinking of research questions need to be careful?

A

Because justifiying the fact without considering the research is not a correct way of scientific research.

42
Q

True or false: should variables be normally distributed in a pearson correlation?

A

Yes. For statistical inference

43
Q

If, for a pearson correlation, assumption of normality appears violated

(the sample size is less than 30, variables do not appear to be normally distributed)

what other test can be used?

A

Spearman correlation

44
Q

How can normality be violated for a pearson correlation?

A

the sample size is less than 30, variables do not appear to be normally distributed

45
Q

If linearity is violated for a pearson correlation, what would this look like?

A

Monotonic relationship between variables. Therefore use spearman or try transforming the data

46
Q

What type of test is a pearson correlation?

A

Parametric. Parametric tests always make assumptions about the distribution of the data being normal. Would always go for a parametric test as have more power.

47
Q

What is the next step of a correlation?

A

Regression!

48
Q

What is a non parametric correlation?

A

Spearman

49
Q

What does a spearman correlation measure?

A

The association between TWO ORDINAL VARIABLES. X and Y both consist of ranks. Specifically, the degree of monotonic relationship between two variables, because assumption of linearity does not need o be met

The consistency of the DIRECTION of the association between two intercal/ratio variables. Therefore, interval.ratio data must be converted to ranks before conducting spearkman correlation.

50
Q

True or false: Spearman correlation coefficient uses sae formula as pearson, only calculations are performed on rank data instead

A

TRUE

51
Q

The assumption of a spearman correlation is that the data is…

A

ORDINAL

52
Q

Because the data is ordinal in spearman correlation, how does this suggest test is less powerful than pearson?

A

Converting data to ordinal means you lose variability and richness of the data.

53
Q

What does it mean for a relationship to be monotonic?

A

The variables have a monotonic relationship or association which means relationships are consistently one directional, but not neccessarily linear.

Because pearson correlation assume linearity, spearman assumes not that data points perfectly fit a line but that data is either consistently increasing or decreasing

54
Q

Alongside data being ordinal in spearman, what is the assumption regarding the data?

A

That the data is monotonic.

55
Q

What is a non monotonic relationship between two variables?

A

As a variable increases, the other increases, but sometimes as variable decreases, the other decreases too (up and down)

56
Q

Part two: shared variance/partial correlations?

A

What does shared variance mean and why would we want to calculate partial correlations?

57
Q

How can we assess the relationship strength in correlation?

A

Using the COEFFICIENT OF DETERMINATION (r2)

Which is an effect size

58
Q

What is the R in spearman correlation?

A

Spearman’s rho

59
Q

How do we calculate the coefficient of determination (an effect size) which assess the relationship strength?

A

By simply squaring r (the correlation coefficient)

60
Q

the coefficient of determination tell us:

A. the proportion of variabiity in Y that is accounted for by variability in X
B. How accurately one variable predicts another
C. A and B

A

C

61
Q

In a spearman correlation, the coefficient of determination tells us:

A. the proportion of variance in the RANKS that the two variables share
B. How accurately one variable predicts another
C. A and B

A

C

62
Q

what is this? r2

A

the coefficient of determination (calculated by squaring the correlation coeffiicent). effect size

It tells us how strongly the two variables are associated

63
Q

How would we find the proportion of overlapping variance?

A

BY r2 (coefficient of determination).

64
Q

To find the proportion of overlapping variance amongst variables, and r = .5, how would we work this out?

A

(.5) 2 (squared)

65
Q

You’re looking at a table, specifically at r squared (2). You have the variables CHEESE and BREAD and can see that the coefficient of determination for these both is .6. What is this saying?

A

That 6% of the variability in cheese can be explained with variabiity in bread.

66
Q

If a correlation is measuring the degree of overlap between variables, what is shared variance really showing?

A

How much the VARIANCES of each variable overlap. That is what coefficient of determination is showing us.

67
Q

If there is 6% in variance in the outcome variable accounted for by the predictor variable - does this mean one variable is causing another?

A

NO. it is about being accounted for, not caused by.

68
Q

If you have three variables (X1, X2, Y) and C shows the variance shared by ALL three variables, to see the UNIQUE associations between, say, x1 and Y, what needs to be removed?

A

C. Partial correlations investigated

Because need to control for that third variable when looking at association between more than two variables

69
Q

Why would we do partial correlations?

A

Because it measures the assocations between two variables, controlling for the effect that a third variables has on them both

70
Q

What is a partial correlation

A

The correlation between two variables when you hold constant the effects of a third variable on both of the other variables

71
Q

If we wanted to examine the unique effect of revision time on exam performance while controlling for the effects of another variable, like exam anxiety, on both revision time and exam performance, what type of correlation would we be looking at?

A

Partial correlation because it allows us to control for exam anxiety

72
Q

*** What does a SEMI partial correlation do compared to a partial correlation?

A

Where a partial correlation controls for the effect of a third variable on the correlations between two variables, a semi partial correlation controls for the effect a third variable has on ONE of the others.

73
Q

When is a semi-partial correlation used?

A

It is used in multiple regression because if you square the semi-partial correlation, it tells you the variability in the outcome uniquely accounted for by one specific predictor variable

74
Q

Which does the below - a partial correlation or a semi partial correlation?
Controls for the relationships between predictors so the outcome variables relationship with other predictors is still taken into account

e.g tells unique effect of revision time on exam performance whilst controlling for effect of exam anxiety on revision time

A

Semi-partial correlation

75
Q

What happens when we square a semi-partial correlation? especially pertaining to multiple regression

A

It tells us how much TOTAL variability in Y uniquely accounted for by ONE SPECIFIC PREDICTOR VARIABLE. It controls for the associations between predictor variables.

So it takes into account X1 AND X2 associations in semi partial correlations whilst also giving unique effect for, say, X1 on Y.

76
Q

What are zero order correlations and which analyses would we see them in

A

Zero order is the correlation between two variables when you DO NOT Control for any other variables.

So we do this in pearson and spearman correlations - just looking at relationship between two variables without controlling for anything.

77
Q

Do partial correlations control for the effects of one or more variables?

A

Yes.
1st order correlation: partial correlation that controls for first variable
2nd order correlation: partial correlation that controls for TWO variables

78
Q

What is the directionality problem?

A

The idea it is not possible to determine which variable is the cause and which is the effect. Research supports bidirectional effects across variables.

Therefore correlations with cross sectional data are limited

79
Q

What is the third variable problem?

A

A relationship established between two variables DOES NOT mean there is a DIRECT relationship as a third variable may be responsible for the relationship

80
Q

If we wanted to test the correlation strength between exam performance and revision time between males and females, what test would we use?

A

A multiple regression with interactions because we are looking at whether an association between two variables differs by group.

81
Q

True or false: Partial correlations can be performed on non-parametric data

A

True. Spearman’s partial rank order correlation can be used :)

82
Q

When dealing with missing values in correlations, what does exclude cases pairwise mean?

A

When for each correlation, we exclude participants who do not have a score for both variables. If there is more than 1 correlation reported, sample size may vary across different correlations

83
Q

When dealing with missing values in correlations, what does exclude cases listwise mean?

A

Across ALL correlations, exclude participants who do not have a score for every variable. The sample size WILL be the same for all reported correlations.

It is not recommended to do listwise, so just removing partiicpant from entire dataset.

84
Q

What is a regression?

A

A way of modelling the association or relationship between the variables. We’re looking at modelling the data we have in a linear fashion.

It is a model used to predict the value of one variable from another. It is a linear one

Describing the relationship using the equation of a straight line

85
Q

Which axis does the outcome or DV go on and which axis predictor?

A

Predictor: X

Outcome/DV: Y

86
Q

What is the equation for describing a straight line or the line of best fit?

A

Y(i) = b0 + b1X1 + ei

b0 = y intercept. Value of Y when x = 0.

B1 - regression coefficient for the predictor - strength and direction of relationship

e = error term. Difference between ACTUAL (data point) and PREDICTED value of Y for ith person.

87
Q

What do we need to know about a line to predict outcome variable Y?

A

The intercept, the slope for a predictor variable, and what the error is

88
Q

Why do we want small error terms?

A

Because smaller error terms mean the difference between actual scores and predicted scores are less, which mean the model is more accurately predicting scores. Less difference between ACTUAL scores and PREDICTED scores.

89
Q

Why do we square residuals?

A

Because some are above and below the line and if didnt square, would cancel each other out. After squaring, sum to get sum of squares residual.

90
Q

What does the method of least squares do?

A

It uses calculus to determine the regression line that minimises the sum of square resisuals aka reduces least square residuals

91
Q

True or false: A regression line does not pass through the mean of the predictor and the outcome. Hence when X is equal to its mean, Y is predicted to be equal to the mean of Y

A

False. A regression line DOES always pass through the mean of the predictor

92
Q

What is the equation of the line of best fit?

A

Y(i) = b0 + b1X1 + ei

93
Q

In the coefficients table, we have unstandardised score (B), standard error, standardised, etc.

What are we looking at to find out what the modelling is predicting?

A

The regression coefficient unstandardised (B). No the intercept but the value next to the predictor variable.

The unstandardised value (B) is the slope. If positive, positive slope. And it is saying as X increases by 1 unit change, then the Y would increase by (insert unstandardised variable which is the regression coefficient)

94
Q

Rather than doing the linear equation manually, using statistical software we are looking at the coefficient table for what exactly?

A
  • The direction
  • Magnitutde
  • Whether a variable is a statistically significant predictor
95
Q

we have unstandardised coefficient (B) and standardised coefficient (beta, little b). What is the difference between the two?

A

B tells us similar information to b.

However, B is communicating that for ONE UNIT CHANGE in predictor variable X, this will be the change in Y, b is saying:

This is the expected STANDARD DEVIATION change in Y for a 1 SD change in X.

So standardised beta is 1 SD change, and unstandardised B is a 1 unit change.

96
Q

If we wanted to look at the DIRECTION of relationship between variables, would we look at B or beta?

A

Look at B. Unstandardised. That tells us about the slope.

Unstandardised is looking at unit changes, standardised is looking at SD changes

97
Q

If we wanted to look at the MAGNITUDE/STRENGTH of relationship between variables, would we look at B or beta?

A

Beta. Standardised tells us about effect sizes.

Unstandardised is looking at unit changes, standardised is looking at SD changes

98
Q

When looking at the coefficients table, what does p

A

Significance. Signficant predictor.

Specifically, it indicates that our model explains an association between variables in a statistically significant way, meaning the two variables are related more than by chance.