Correlation Flashcards

1
Q

What is a correlation?

A

The bedrock of a regression analysis.

Definition: assesses the degree to which scores (from a set of respondents) on two variables co-relate (how a change in one variable affects the other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The types of correlations:

3 kinds

A
  1. Pearson’s product-moment correlation coefficient (r)
  2. Spearman’s rank correlation coefficient (rs)
  3. Partial correlations - leads into regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a correlation coefficient?

A

A correlation coefficient provides an index of the extent to which two variables are linearly correlated (i.e. the strength of the linear relationship between the two variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Illustrate, using an example, how correlation and correlation coefficients are different things…

A

Correlation - e.g. to what extent do depression and self-esteem co-relate, is there a linear relationship?

Correlation coefficient - e.g. to what extent is the relationship linear?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a scatterplot?

A

A scatterplot plots scores of one variable against scores of another, each point represents a participant. This gives an idea of the relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a bivariate association?

A

It’s what we look at with correlations, the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do correlation analysis work? Outline

A

We work out the correlation coefficient, say what the probability is that we got this coefficient by chance assuming that the variables are not at all associated with each other –> we want this to be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PEARSONS:

Why do we want both variables on a common rubric?

How do we do this?

A

If the two variables are on different scales its like comparing apples and pears –> sam scale for easier comparison of variables

Instead of making a common scale for both variables we instead look at Z scores for each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PEARSONS:

What is a Z score?

A

A standardised score for variables to create a scale to compare them on

Z = (x-u) / o

Z = (observed value - mean) / standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PEARSONS:

What is a standard deviation?

A

Standard quantity expressing how much the values in a data set differ about the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PEARSONS:

What do we do with Z scores?

A

They are multiplied together and summed

E Zx Zy (sum of z scores on x axis and on y axis)

A perfect positive correlation has a Z sum of (max) N. This happens when the Zx and Zy sums are equal.

A perfect negative correlation has a Z sum of (min) -N. This happens when the Zx and Zy sums are equal but opposite in sign.

N is the sample size

So, when expressing the strength of the correlation/ its linearity (r), the Z sum is expressed as a ratio of max/min N.

r = sum of ZxZy / N

So when there’s a perfect positive correlation sum of ZxZy = n/n =1. When there’s a perfect negative correlation the sum of ZxZy = -n/n = -1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PEARSONS:

What is r?

A

R is the correlation coefficient, which we want to be a close to 1 as possible.

Want the p value to be as low as possible - unlikely to have found results by chance assuming null hyp is true.

The degrees of freedom used here is N-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PEARSONS:

Reporting the results of a Pearsons…

A
  • Significant or not?
  • Direction?
  • Between? (state variables)
  • r(df)=___, p=___, this means…

Use a table where possible. If something is reported in a table it doesn’t need to be reported in the text - won’t be negatively marked but its a waste of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PEARSONS:

Assumptions underlying a pearsons correlation?

How can these assumptions be violated?

What do you do if assumptions are violated?

A

Assumptions:
1. Variables must be measured on interval scales (equal spacing between intervals)

  1. Two variables must be linearly related

Violations:

  • non-linear relationship
  • outliers
  • ordinal data (in order but not equally spaced)

What to do:
If the assumptions are violated then… a non-parametric correlation must be used, uses rank scores (Spearman’s rank (rs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SPEARMAN’S:

Spearmans rank correlation coefficient/ Spearmans rho, when is it used?

A

This correlation can be used when the assumptions of a pearsons are violated - when the scatterplot shows a monotonic a curve but non-linear relationship (a curve) between variables and when there are outliers or data is ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SPEARMAN’S:

How does it work?

A

Instead of using raw scores, here scores are assigned a rank e.g. highest = 1, lowest = 10. Then the ranks are correlated, so if there’s a perfect correlation the same person should have a 1 rank across both variables…

17
Q

SPEARMAN’S:

What does monotonic mean?

How does spearman’s “flatten” out monotonic relationships between variables?

A

When there’s a relationship between the variables that goes in one direction but it’s non-linear (not a straight line)

When the scores in a spearman’s are ranked, the relationship becomes linear, the curve is flattened.

18
Q

SPEARMAN’S:

What is the caveat of a Spearman’s correlation?

Non-monotonic relationships

A

Doesn’t work if the relationship is a U/V shape or inverted U/V (non-monotonic - two directions). The two linear relationships, when ranked, stay as two relationships - cannot draw a straight line through the graph

19
Q

SPEARMAN’S:

Outliers

How are they solved using spearman’s?

A

When using raw scores, if there are outliers they can skew the estimated strength of the relationship between the variables. The line of best fit drawn including outliers looks different to one draw not-including outliers.

When ranked the outlier is brought down to the main body of results (now only one rank away from next value)

20
Q

SPEARMAN’S:

Reporting the results of a spearman’s…

A
  • Why use this test? e.g. outliers
  • Significant or not?
  • Direction?
  • Between? (state variables)
  • rs(df)=___, p=___, this means…
21
Q

PARTIAL CORRELATION:

When is this method used?

A

A partial correlation can be used when you want to control for the effect of a ‘third (unknown) variable’

22
Q

PARTIAL CORRELATION:

What does this method investigate?

A

A partial correlation investigates the strength and direction of a linear relationship between two variables, whilst controlling for a third continuous variable

Continuous variable - a variable with an infinite number of possible values (opposite of discrete variable) e.g. length

23
Q

PARTIAL CORRELATION:

Example of when a partial correlation could be used…

A

“Is the relationship between attitude and intention due to past behaviour (e.g. past behaviour is correlated with both)?”

What happens to the correlation when the effect of past behaviour is controlled for?

It might be that past behaviour is driving peoples intentions rather than attitudes - want to control the effect of this variable

You’d come to this conclusion running a correlation analysis on the three variables:

  1. Run correlation
  2. attitude and intention are correlated
  3. past behaviour is strongly correlated with both attitude and intention
  4. Maybe the reason that att. and int. are correlated is because past behav. correlates with both –> run partial correlation

If there is still a correlation between att. and int. when past. behav. is controlled then they are related, might be that the correlation isn’t as strong as before… some of the variance was explained by past. behav. but not all of it

24
Q

PARTIAL CORRELATION:

How is a partial correlation reported?

A

Simple - 3 variables:

  • Test conducted? To measure?
  • Significant or not?
  • Direction?
  • Between? (state variables)
  • r(df)=___, p=___, whilst controlling for… ___.

Complex - 4 variables:
- Partial correlations were conducted to test the strength of the relationship between ___, ___, and ___, while controlling for ___. Sig, direction, found between ___ and ___, r(df)=___, p=___, and between ___ and ___,r(df)=___, p=___, whilst controlling for ___. But the correlations between ___ and ___ was non-significant, r(df)=___, p=___.

25
Q

Learning objectives of the lecture:

A
  1. Be able to produce and interpret scatterplots.
  2. Be able to conduct, interpret and report different correlation analyses (Pearson, Spearman, and partial correlations).
  3. Understand and be able to explain the assumptions underlying correlational analyses and what to do if they are violated.
26
Q

Filtering data:

How do you filter data? Why would you need to filter data?

A

Data –> Select cases –> Select variable from list –> Select “if condition is satisfied” –> Enter min/ max value e.g. deppression<15 –> continue –> output: filter out unselected cases –> OK