lecture 14 - correlation - additional considerations and ordinal data Flashcards

Question 1

Q

Pearson’s r correlation coefficient

Answer

A

This asks, how strongly related are two continuous variables measuring interval (or ratio) data?
Can take values from:
-1 (perfect negative relationship),
To 0 (no relationship),
+1 (perfect positive relationship).
And all values between -1 and +1.

r = COVxy/ SxSy

Question 2

Q

understanding covariance - scatterplots

Answer

A

COVxy = ∑(X - X̄)(Y - Y-Bar)/ N - 1

X̄ - means of each variance
graphs in notes

top right quadrant = positive numbers
X - X̄ across and Y -Y-BAR up

bottom left quadrant = positive numbers
Y - Y-bar down and X - X̄ across

top left quadrant = negative numbers
X - X̄ across and Y - Y-bar up

bottom right quadrant = negative numbers
X - X̄ across and Y - Y-bar down

Question 3

Q

normal covariance

Answer

A

r = COVxy/ SxSy
the larger Sx, the larger COVxy
the larger Sy, the larger COVxy

COVxy = ∑( X - X̄)(Y - Ybar) / N-1

s = √∑(Y - Y-bar) ^2/ N -1
so if we divide COVxy by both Sx and Sy,
we will ‘normalize’ its value.
after normalisation, its value cannot exceed + 1
and cannot be smaller - 1

Question 4

Q

Relationship effect sizes

Answer

A

Pearson’s r is already an effect size!
Cohen’s rules thumb for the effect size of r:
r = 0.1 small effect size
r = 0.3 medium effect size
r = 0.5 large effect size.
More nuanced research conclusion: the correlation between tulips and roses ratings was very large and positive, r = 0.803, and was marginally significantly different from zero, r(4) = 0.803, p = 0.054, two-tailed

Question 5

Q

Assumptions for Pearson’s r correlation coefficient - a parametric test

Answer

A

Random and independent samples
Normality (of the residuals….) with interval or ratio data
This normality is more complicated here for reasons you’ll understand better later when you’ll learn about the General Linear Model.
For now, looking to see if the distributions of the variables (histograms) are reasonable normal is still a reasonable thing to do…
The variables have a linear (straight line) relationship
An additional worry: Look for outliers in the scatterplot and histograms….
Because Pearson’s r is parametric (involves means, etc.) outliers can have a strong influence on the outcome of the statistical test.
Practical considerations: It’s hard to assess normality with this little data and also what if the data is only ordinal and not interval?

Question 6

Q

wavy lines

Answer

A

Pearson’s r correlation coefficient assume a linear (straight line) relationship between the variable.
BUT, there are many other possible relationships between the variables that aren’t linear.
You can see this if you’ve plotted a scatterplot J,
but if you’ve only calculated r you can’t L

Question 7

Q

epidemic initially grow exponentially

Answer

A

however in longer term the disease dies out again

Question 8

Q

what if the data is ordinal rather than interval? spearmans rho

Answer

A

rank each variable separately.
calculate the means and SD’s of the ranks.
work out COVxy

Question 9

Q

spearmans rho for ordinal data

Answer

A

rho = COVxy/ SxSy
the uncertainty about the data being interval or ordinal doesnt matter practically

Question 10

Q

Note. We haven’t given you a table of critical values for Spearman’s rho…..

Practically when should you consider using Spearman’s rho?

Answer

A

When…
… you aren’t sure whether the data is interval or ordinal….
… when you’re unsure if the relationship is linear….
… when you’re uncertain about the normality assumption for Pearson’s r….
… when the data potentially has outliers….
There’s little downside to looking at rho most any time you look at r….
If they’re similar….. Good J
If they’re not …. think hard…. Talk to Field, etc.

lecture 14 - correlation - additional considerations and ordinal data Flashcards

(10 cards)