Correlation Flashcards

1
Q

Correlation

A

Correlation is not causation

Correlation coefficient = measures strength and direction of the linear association between two numerical variables (reflects the amount of scatter / variation in the association – it does NOT “fit a line” to data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

simpson paradox

A

correlations appear in different groups of data but disappear or reverse when these groups are combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Calculationg correlation co-efficient and CIs

A

1) calculate correlation co-efficient (r for sample and P for population) using equation -> can calculate SE for r and P

Can’t use this to estimate CIs as sampling distrib for r is NOT normally distributed as correlation coefficient is bound between -1 and 1

2) Use Fisher’s Z tranformation to calculate Z (making data normally disitrbuted)

2) calculate SE of Z

3) Use SE to calculate 95% CIs

z- 1.96SE < z < Z+ 1.96SE

4) back transform Z and the CIs to see where CIs actually lie in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bootstrap method

A

repeatedly draw samples with replacement from the data to create a sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hypothesis testing: Is this correlation significantly different from 0?

A

Null hypothesis H0: rho ρ = 0
Alternative hypothesis HA: ρ <>0

Test using t test with student’s t distrib with n-2 degrees of freedom -> n - 2 df as we are using two summaries of the data X-bar and Y-bar to calculate r

t= r/ SEr

SEr= sqrt 1-r^2/ n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

assumptions of testing correlations

A
  • Both variables are on an interval or ratio level of measurement
  • The bivariate data must be normally distributed i.e both variables are normally distributed or approach normality after data transformation.
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables
  • Homogeneity of variance (e.g. not a funnely)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Correlation analyses for non linear relationships

A

use non-parametric (no assumptions about sampling distrib) -> Spearman’s rank correlation (non-parametric use ranks not numbers)

Spearman’s rank correlation measures strength + direction of linear associated between ranks of 2 variables

Method:
- Rank data for both variables, smallest to largest
- R= rank for X variable
- S= ranl=k for Y variable
- Calculate Rs

Hypothesis testing: is variable significantly different from 0?

ts= rs/ SErs
SErs= sqrt 1-rs^2/ n-2

When n is 100 or less use G table

When N is >100 use t table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Spearmans rank assumptions

A
  • Random sample
  • Linear relationship between ranks of numerical variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measurement error

A

Measurement error = when variable not measured perfectly

Measurement error in both x and y causes weaker observed correlation than is true

When sample correlation coefficient r underestimates the value of ρ (rho) this is an attenuation. It can be caused by imprecision – r often underestimates rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly