STATS 15- correlation and regression Flashcards

1
Q

Correlation research design

A
  • Experiments may be impractical/Unethical for some research questions
  • “Does cholesterol affect the probability of heart disease”
  • “Does smoking shorten peoples life expectancy”
  • But we can look for relationships between such variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relationship between 2 variables

A
  • Is there a relationship between IQ and Exam marks
  • Bivariate data- each participant there are 2 different variables measure, we see for any relationships (C.f. within-subject design)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Start with a scatter plot

A
  • Height and intelligence
  • Data suggests no relationship between height & intelligence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Strong positive Correlation, R=1

A
  • strong positive correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Strong NEGATIVE correlation, R= -1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

No correlation- Height and intelligence

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non-linear correlation

A
  • NB: correlations are not about how steep any slope is but about the variation of values around the slope (how well values fit the slope)
    • 1= perfect fit
  • Strong positive 0-8
  • Strong negative 8-14
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlations: Hypotheses testing

A
  • Null hypothesis
  • No relationship between variables X and Y above that expected by chance alone
  • NB: Correlations are not about how steep any slope is, but about the variation of values around the slope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measuring the degree of relationship

A
  • Pearson product moment (r)- how well does a straight line fit the data
  • r = -1 (perfect negative relationship)-X decreases as Y increases
  • r = +1 (perfect positive relationship)-X increases as Y increases
  • r = 0 (No linear relationship)
  • Is affected by outliers and by number of pairs of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pearsons

A
  • Assumption
  • linear relationship between X and Y
  • Continuous random variables
  • Both variables must be normally distributed
  • X and Y must be independent of each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measuring the degree of the relationship

A
  • Pearson, r=
    • or - sign indicates = direction of relation
  • Value indicates strength. Valye closer to either -1 or +1 reflect a strong correlation valyes close too 0 mean weak/no correlation
    • r= -0.65 quite strong negative correlation
    • r= +0.65 equally strong positive correlation
    • P value indicates significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Some real data

A
  • Positive but not very strong relationship
  • With quite a lot of variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

And if we collect more data

A

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The null hypothesis (r=0)

A
  • What is the chance that there really isn’t a correlation (r=0)
  • OR
  • That we got our value of r by chance p<0.05
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What it all means

A
  • r= +0.701 ; p<0.001 (N=30)
    • 0.701 => How close are the points to a straight line
  • p<0.001 =>How likely is it that the true correlation co-efficient is actually zero (no correlation) and we got this r value by chance
  • N=30 => how may pairs of points there are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpret correlations cautiously

A
  • Correlation does NOT imply causality
  • Correlations are affected by range restrictions
    • E.g. Height… you don’t get many people above 7ft
  • Correlations are affected by outliers
  • Correlation only measures the degree of LINEAR relationships
    • Plot the graph to see if linear
17
Q

Correlation and causation

A
  • Ice cream sales and the number of shark attacks on swimmers are correlated
  • The number of cavities in primary school children and vocabulary size has a strong positive correlation
    • Cant say vocab causes cavities (Probably due to age)
  • The more tvs per citizen the longer the average life expectancy of a country
  • Patients operated on by surgeons with cleaner hands live longer
18
Q

Non-parametric alternatives

A
  • Spearman’s Rho
    • Spearmans rank correlation co-efficient
      • Kendall’s tau
    • Cross tabulation, c.f. Chi squared
  • Fewer assumptions, robust to outliers, but also less sensitive
19
Q

Advantages of Non-parametric correlation. ranking 1

A
  • Can convert non-linear data to linear- allow us to perform linear statistical test on the data
20
Q

Advantages II

A
  • Less sensitive to outliers
  • ranking can distribute the data more evenly
21
Q

Spearman’s Rho (p, rs)

A
  • Tests for a relationship between the ranks of 2 variables
    • So put paired variables in tables and rank each one
    • Compare differences in ranks
  • E.g. Is there a relationship between age and shoe size
22
Q

Spearman’s Rho- Formula

A
  • d= Difference in rank
  • N = Number of participants
  • E = sum of
23
Q

Example- Positive correlation

A
24
Q

Reporting the result

A
  • The Spearman’s Rho test was applied to the data and a significant positive correlation was found between age and shoe size
  • (rs= 0.9, n=5, p<0.05)
  • As shoe size increased, age increased
  • No implication of causality
25
Q

Another rank correlation: Suitable for contingency table data

A
  • contingency table is when you relate on variable to another
  • e.g. age of students v classification of degree they got
26
Q

Why not use chi-squared

A
  • Because three cells have less than 5
    • See nonparametric testing lectures
  • Can use Kendall’s Tau
  • A method for measuring the association between variables in cross tabulations
27
Q

How does Tau work

A
  • If there was a positive correlation between age and classificaiton
  • younger = 1st / oldest= 3rd
  • we would expect most of the data to fall between the tram lines if it was positively correlated
28
Q

How does it work

A
  • we would expect most of the data to fall between the tram lines if it was negatively correlated
  • This doesn’t happen
  • If there was a positive correlation between age and classificaiton

younger = 1st / oldest= 3rd

29
Q

How does it work

A
  • Essentially, Kendall’s Tau is a statistic which for each cell compares the number of cases below and to the right of the cell with those above and to the right
    *
30
Q

How does it work

A
  • There is not a similar balance in the data meaning it is not correlated
31
Q

Reporting the result

A
  • Tau tells us both the size and direction (like Spearman’s Rho) of the correlation
    • Computers can also compute the significance value for the sample size used
  • E.g. A Kendall Tau test for ordered contingency tables suggested no significant relationship between age and degree class (tau =-0.44, N=180, p>0.05
32
Q

Reporting correlations

A
  • Describe data (Including scatterplot)
  • Describe relationship in words
  • Quote N- if we have a large N we are more likely to see real correlation (or lack of)
  • Quote co-efficient (with value of p)- links with how significant our results are
33
Q

Summary of correlations

A
  • Pearsons product moment: r
  • Spearman’s Rho: rs- suitable for non-parametric data
  • Kendall’s Tau: T- data represented in a contingency table
  • When to use each
    • What research design
    • Normal data