Lesson 2.3: Covariance and Correlation Flashcards
Relationship Between Variables
Bivariate Variables
Variables
1. Response variables (dependent)
2. Explanatory or predictor variable (independent)
Response variable
- value can be explained by the explanatory variable or predictor variable
Scatter Plot / Diagram
- a graph that shows the relationship between 2 quantitative variables on the same individual
- x and y axis
Covariance
definition
- measures the direction of the linear relationship between two quantitative variables.
- If the values of x and y become large or small, the covariance coefficient will also become large or small
- if multiply all x values and/or all y values, Covariance changes by same factor
- doesn’t have to be b/n -1 and +1
- doesn’t give real sense of how negatively related variables are
Cov(X,Y) = (sum of (x - mean x)(y - mean y)) / (n - 1)
Computing Covariance
Excel and R
- Excel: =COVARIANCE.S(x data range, y data range)
- R: cov(x vector, y vector)
Correlation
definition
- measures the strength and direction of the
linear relationship between two quantitative variables. - r = sample correlation
- ρ (rho) = population correlation
- can vary between -1 and +1
- doesn’t change when x and/or y multiplied by factor
r = (sum (x-mean x)(y-mean y))/ (n-1)(sd of x)(sd of y)
OR
r = sample covariance / (sd of x)(sd of y)
Correlation Values
ranges
Positive Correlation (both x and y increase)
- r = 1: perfect
- r= 0.9: strong
- r= 0.4: moderate
Negative Correlation (x increases, y decreases)
- r = -1: perfect
- r = -0.9: strong
- r = -0.4: moderate
No correlation
- r close to 0 (scattered or parabolic)
Computing Correlation
Excel and R
- Excel: =CORREL(x data range, y data range)
- R: cor(x vector, y vector)
Correlation vs Causation
- Correlation near -1 or 1 = linear relationship
- If 2 variables are correlated, we cannot conclude that they have casual relationship
- Lurking variable : Third variable that explains relationship