lecture 13 - correlation - a test of relationships Flashcards

1
Q

Behaviours: differences versus relationships

A

Differences?
* Are clinically depressed patients less depressed after taking a new drug than a placebo? (IV: discrete, DV: continuous)
* Are people more likely to order a salad a restaurant with a green colour scheme than a blue one? (IV: discrete, DV: discrete)
* Do energy drinks result in better memory performance than coffee? (IV: discrete, DV: ? Probably continuous)
Does giving a significant other roses make them more likely to be happy than giving them tulips?

Relationships?
* Is the therapeutic effectiveness (continuous variable) of an antidepressant drug related to the dosage (continuous variable)?
* Does the brightness of the lights in a restaurant (continuous) relate to how fast people eat (continuous)?
* Does the amount of caffeine in a person consumes in an energy drink (continuous) relate to memory performance (continuous)?
Does the number of roses given to a significant other relate to their happiness?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

differences vs relations

A

graphs in notes

not all that different and quite closely related

For example, say the variable on the y-axis is height (on some strange measurement scale where a typical height is “5”) and on the x axis is foot length (again measured on some scale where a typical length is 1, i.e. one foot ;-)). I can ask the question: Are height and foot size related? The first graph suggests that height might be related to foot size. And we might assessment that in terms of a correlation coefficient to specific how much of a relationship there is. (We’ll cover correlation in detail in a later lecture.) But I can for example take the foot size information (a continuous variable) and split it into just two categories: big feet and little feet, and plot the data as in the second graph. Then I can ask the question: is height different for people with big feet versus little? The second graph suggests there might be a difference. (And we’ll cover assessing differences as well for example with t tests). Finally I can also take the height variable also split it into two categories and plot it as in the third panel. I might then ask if the two variables are related? (Covered as chi-square test of contingency). The key point here is that relationships between things, e.g. height and foot size, also imply some kind of differences, e.g. height is different depending on foot size. And differences imply also imply some kind of relationships. So we’re going to teach you various ways of assessing differences and relationships, so you’d be forgiven for thinking they’re all very different but actually they’re mostly related to the same underlying idea: the general linear model which you’ll learn about more formally next year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tests of relationship strength between two continuous variables: correlation

A
  • Interval or ratio data: Pearson’s r correlation coefficient
  • Ordinal data: Spearman’s rho correlation coefficient
    Nominal variables/ Tests of relationships between categorical variables: chi-square contingency testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

red line in graph

A

Note. The red line is a “regression line”, a line of best fit representing a best guess for the value of the variable on the y-axis based on the value of the variable on the x-axis.
Note. There are ways to be able to formally calculate the exact form of a regression line based on a given set of data, but you don’t need to know how to do that now. The emphasis here is on the regression as a kind central tendency of the data analgous to the mean and the variable around that line is analogous to the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

correlations

A

graphs in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pearson’s r correlation coefficient

A

This asks, how strongly related are two continuous variables measuring interval (or ratio) data?
Can take values from:
-1 (perfect negative relationship),
To 0 (no relationship),
+1 (perfect positive relationship).
And all values between -1 and +1.

r = cov xy/ Sx Sy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

calculating covariance

A

COVxy = ∑ (X -X̄)( Y -Y-bar) / N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the correlations coefficient r

A

df = n - 2
find that on critical values table at 0.05 significance to find the correlation coefficient

look on SPSS output to see what two-tailed p valley would be if over 0.05 not significant relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

look at screenshot in notes

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

standardisation

A

To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units. This process is known as standardization. We need a unit of measurement into which any variable can be converted, and typically we use the standard deviation. We came across this measure in Section 1.8.5 and saw that, like the variance, it is a measure of the average deviation from the mean. If we divide any distance from the mean by the standard deviation, it gives us that distance in standard deviation units. For example, for the data in Table 8.1, the standard deviation for the number of packets bought is approximately 3.0 (the exact value is 2.92). In Figure 8.2 we can see that the observed value for participant 1 was 3 packets less than the mean (so there was an error of −3 packets of sweets). If we divide this deviation, −3, by the standard deviation, which is approximately 3, then we get a value of −1. This tells us that the difference between participant 1’s score and the mean was −1 standard deviation. In this way we can express the deviation from the mean for a participant in standard units by dividing the observed deviation by the standard deviation.

It follows from this logic that if we want to express the covariance in a standard unit of measurement we can divide by the standard deviation. However, there are two variables and, hence, two standard deviations. When we calculate the covariance we calculate two deviations (one for each variable) and multiply them. We do the same for the standard deviations: we multiply them and divide the covariance by the product of this multiplication. The standardized covariance is known as a correlation coefficient and is defined as follows:

screenshot in notes

in which sx is the standard deviation of the first variable and sy is the standard deviation of the second variable (all other letters are the same as in the equation defining covariance). This coefficient, the Pearson product-moment correlation coefficient or Pearson’s correlation coefficient, r, was invented by Karl Pearson with Florence Nightingale David2 doing a lot of the hard maths to derive distributions for it (see Figure 8.3 and Jane Superbrain Box 8.1).3 If we look back at Table 8.1 we see that the standard deviation for the number of adverts watched (sx) was 1.673, and for the number of packets of crisps bought (sy) was 2.915. If we multiply these together we get 1.673 × 2.915 = 4.877. Now all we need to do is take the covariance, which we calculated a few pages ago as being 4.250, and divide by these multiplied standard deviations. This gives us r = 4.250/4.877 = 0.871.

2 Not to be confused with the Florence Nightingale in Chapter 5 who she was named after.

3 Pearson’s product-moment correlation coefficient is denoted by r but, just to confuse us, when we square r (as in Section 8.4.2.2) an upper-case R is typically used.

By standardizing the covariance we end up with a value that has to lie between −1 and +1 (if you find a correlation coefficient less than −1 or more than +1 you can be sure that something has gone hideously wrong). We saw in Section 3.7.2 that a coefficient of +1 indicates that the two variables are perfectly positively correlated: as one variable increases, the other increases by a proportionate amount. This does not mean that the change in one variable causes the other to change, only that their changes coincide (Misconception Mutt 8.1). Conversely, a coefficient of −1 indicates a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount. A coefficient of zero indicates no linear relationship at all and so as one variable changes, the other stays the same. We also saw that because the correlation coefficient is a standardized measure of an observed effect, it is a commonly used effect size measure and that values of ±0.1 represent a small effect, ±0.3 is a medium effect and ±0.5 is a large effect (although we should interpret the effect size within the context of the research literature and not use these canned effect sizes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly