Week 2: finding the quantitative relationship between 2 variables Flashcards
What principle do we use when we estimate b0 and b1 (using their formulas) ?
the Least Square principle
What does the least square principle guarantee?
that the regression line is the best fit of data
What are the b0 and b1 equations derived from?
minimising the sum of the squares of the vertical distances between the observed Yi and predicted Ŷi values of the Dependent Variable:
min∑(Yi−Ŷ)^2 = min∑(Yi−(b0+b1Xi))^2
What does the least square principles guarantee that?
- that the regression line obtained has the smallest sum of squared residuals
- a regression line is the best approximation to the quantitative relationship existing between the variable Y
What assumptions under-lie linear regression? (4)
Linearity
Independence of Errors
Normality of Error
Equal Variance (AKA homoscedasticity)
What is the linearity assumption?
the relationship between X and Y is linear
What is the ‘independence of errors’ assumption?
error values are statistically independent
this is particularly important when data is collected over a period of time
What is the ‘normality of error’ assumption?
error values are normally distributed
What is the ‘Equal Variance’ assumption?
the probability distribution of the errors has constant variance
What is the residual for the observation i, ei??
the difference between its observed and predicted value
ei = Yi - Ŷi
How do you check the assumptions of regression?
by examining the residuals:
-examine for linearity assumption
-evaluate independence assumption
-evaluate normality assumption
-examine for constant variance for all levels of X (homoscedasticity)
How would you do a graphical analysis of residuals to investigate the assumptions?
plot residuals vs X
What happens to the histogram of the residuals when the assumption of Normality is satisfied?
the histogram of the residuals approximate the bell shape of a normal distribution
Why do we need to compare two or more different regression models?
different estimation methods (different formulas to calculate the slope and intercept)
different populations, different samples, different variables
What statistical instruments can be used to make a comparison?
total sum of squares
R^2
standard error
What equation do you use to work out total variation?
SST = SSR + SSE
Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares