unit 3 - ch 13 - correlation Flashcards
It’s all about relationships:
x - y
Correlation coefficient: terms
X-variable: independent variable, predictor variable
Y-variable: dependent variable, criterion variable, variable of interest
ANOVA:
2 variables ~ 1 nominal (factor), 1 at least interval (criterion variable)
Correlation:
2 variables ~ both variables at least interval
Notation:
Sample = r
population : p (rho)
r =
Correlation coefficient formula
sample correlation coefficent
X variable - x - x bar
Y variable - y - y bar
As the shared variation between the x variable and y variable increases, r approaches its upper or lower limit, respectively +1 and -1
+1 = perfect positive relationship
-1 = perfect negative relationship
0 = absolutely no relationship
R is a measure of both
The strength of the relationship and
The direction of the X - Y relationship
r has no unit of measurement
= unitless
r is not affected by the scale of the data
r values can be compared to each other
example of x and y
X variable: gas prices
Y variable: miles drive
correlation coefficient examples
EX 1
Income y axis
Height x axis
Positive correlation
Taller people make more money on average ):
EX 2
Customer satisfaction y axis
Difficulty in product setup x axis
Negative
Ikea
EX 3
Gpa y axis
Hat x axis
No correlation
EX 4
Control y axis
Speed x axis
Negative
Not all relationships are linear
Exponential, linear etc
EX 5
Performance y axis
Emotional involvement (stress) x axis
Curve (upside down U)
Two ends that are low
High end/peak
Step 6 of HT
EX: Are married men living longer or dying slower? Why?
EX 1
Data:
Alcohol content and calories for 10 beers
Calculating
X = alcohol content
Y = calories
r = 0.957
Testing
Step 1
4 facts of the null = everything is unrelated
Ho p = 0.00
H1 p =/= 0.00
Step 2
a = 0.01
Step 3
TS = observed - expected / chance
TS = r - p/ standard error of the correlation coefficient
TS = 9.97
P = 0.00 (from Ho)
Step 4
df = n -2
df = 9
CV = +/- 2.62
Step 5
9.79 > 2.62 = reject
TS > CV = reject
Step 6
As the alcohol content increases in beer, the calories also increase. That is not to say alcohol causes calories but both are the result of the beer making process. The conversion of sugar into alcohol during fermentation results in alcohol and calories. It is not a perfect correlation as carbohydrates within beer also contains calories.
Correlation vs. causation
r = 0.957
r increases =/= causation
High r does not mean x is causing y
X variable: length of our left arm
Y variable: length of our right arm
cautionary tales
- sample size
- relationship change
- correlation is not causation
- not all relationships are linear
cautionary notes: sample size
at least 10 data points for the x-variable (s) and 10 for the y-variable
Multiple x but only only variable
EXAMPLE
X = age of car
X = odometer miles
Y = selling price
10 points per x and 10 per Y = 30 data points
cautionary notes: relationships change
Over time
Outside the range of data
Don’t want to use relationship found in younger people sample on older people sample
Across space
Geographical
Drop model in a new space but sometimes it doesn’t hold up (american customers vs spanish customers)