STATS 15- correlation and regression Flashcards
1
Q
Correlation research design
A
- Experiments may be impractical/Unethical for some research questions
- “Does cholesterol affect the probability of heart disease”
- “Does smoking shorten peoples life expectancy”
- But we can look for relationships between such variables
2
Q
Relationship between 2 variables
A
- Is there a relationship between IQ and Exam marks
- Bivariate data- each participant there are 2 different variables measure, we see for any relationships (C.f. within-subject design)

3
Q
Start with a scatter plot
A
- Height and intelligence
- Data suggests no relationship between height & intelligence

4
Q
Strong positive Correlation, R=1
A
- strong positive correlation

5
Q
Strong NEGATIVE correlation, R= -1
A

6
Q
No correlation- Height and intelligence
A

7
Q
Non-linear correlation
A
- NB: correlations are not about how steep any slope is but about the variation of values around the slope (how well values fit the slope)
- 1= perfect fit
- Strong positive 0-8
- Strong negative 8-14

8
Q
Correlations: Hypotheses testing
A
- Null hypothesis
- No relationship between variables X and Y above that expected by chance alone
- NB: Correlations are not about how steep any slope is, but about the variation of values around the slope
9
Q
Measuring the degree of relationship
A
- Pearson product moment (r)- how well does a straight line fit the data
- r = -1 (perfect negative relationship)-X decreases as Y increases
- r = +1 (perfect positive relationship)-X increases as Y increases
- r = 0 (No linear relationship)
- Is affected by outliers and by number of pairs of data
10
Q
Pearsons
A
- Assumption
- linear relationship between X and Y
- Continuous random variables
- Both variables must be normally distributed
- X and Y must be independent of each other
11
Q
Measuring the degree of the relationship
A
- Pearson, r=
- or - sign indicates = direction of relation
- Value indicates strength. Valye closer to either -1 or +1 reflect a strong correlation valyes close too 0 mean weak/no correlation
- r= -0.65 quite strong negative correlation
- r= +0.65 equally strong positive correlation
- P value indicates significance
12
Q
Some real data
A
- Positive but not very strong relationship
- With quite a lot of variability

13
Q
And if we collect more data
A
*

14
Q
The null hypothesis (r=0)
A
- What is the chance that there really isn’t a correlation (r=0)
- OR
- That we got our value of r by chance p<0.05

15
Q
What it all means
A
- r= +0.701 ; p<0.001 (N=30)
- 0.701 => How close are the points to a straight line
- p<0.001 =>How likely is it that the true correlation co-efficient is actually zero (no correlation) and we got this r value by chance
- N=30 => how may pairs of points there are