Topic 8: Correlation And Linear Regression Flashcards
Correlation features (2)
Strength of linear relationship between only 2 variables.
Does not imply causation. E.g correlation between ice cream and sunburn, but there is no causation; the weather causes both!
Regression
Estimate one variable on the basis of another
Assumes there IS causal effect from variable to the other
Attempts to describe the dependence of a variable on another variable
Coefficient of correlation ranges from…
-1 to 1
Perfect positive vs perfect negative correlation
R=1 positive
R=-1 negative
R=0 no association.
How to test significance of correlation
Hypothesis testing
H₀=p=0
H₁=p≉0
Null is that the correlation in population is 0 (no correlation)
Steps
Set null/alternate
Choose significance level
Select test statistic
Formulate decision rule
Calculate statistic.
What is the usual significance level
5%
Test statistic formula
T=r x √n-2
/
√1−r²
Degrees of freedom. ~ t(n-2)
Formulate decision rule meaning
2 tailed test: reject null if |T| is>critical value
Dependent variable
The variable being predicted
Independent variable
Predictor variable, provides the basis for estimation
Bivariate regression analysis
2 variables, independent variable estimates the dependent
Two assumptions for bivariate regression analysis
- What do we need in bivariate regression analysis
Relationship is linear
Both variables are interval or ratio scale
- We need slope and y-intercept
Regression equation
Y=a+bx+e
Y=dependent variable (one we predict)
X=independent variable (basis of prediction)
A=intercept term
B=slope of regression line
E=error term (distance between actual point and predicted point, as cannot predict y perfectly)
Principle of least squares
Chooses line of best fit that minimises the errors (sum of squared errors)