Week 2: finding the quantitative relationship between 2 variables Flashcards
What principle do we use when we estimate b0 and b1 (using their formulas) ?
the Least Square principle
What does the least square principle guarantee?
that the regression line is the best fit of data
What are the b0 and b1 equations derived from?
minimising the sum of the squares of the vertical distances between the observed Yi and predicted Ŷi values of the Dependent Variable:
min∑(Yi−Ŷ)^2 = min∑(Yi−(b0+b1Xi))^2
What does the least square principles guarantee that?
- that the regression line obtained has the smallest sum of squared residuals
- a regression line is the best approximation to the quantitative relationship existing between the variable Y
What assumptions under-lie linear regression? (4)
Linearity
Independence of Errors
Normality of Error
Equal Variance (AKA homoscedasticity)
What is the linearity assumption?
the relationship between X and Y is linear
What is the ‘independence of errors’ assumption?
error values are statistically independent
this is particularly important when data is collected over a period of time
What is the ‘normality of error’ assumption?
error values are normally distributed
What is the ‘Equal Variance’ assumption?
the probability distribution of the errors has constant variance
What is the residual for the observation i, ei??
the difference between its observed and predicted value
ei = Yi - Ŷi
How do you check the assumptions of regression?
by examining the residuals:
-examine for linearity assumption
-evaluate independence assumption
-evaluate normality assumption
-examine for constant variance for all levels of X (homoscedasticity)
How would you do a graphical analysis of residuals to investigate the assumptions?
plot residuals vs X
What happens to the histogram of the residuals when the assumption of Normality is satisfied?
the histogram of the residuals approximate the bell shape of a normal distribution
Why do we need to compare two or more different regression models?
different estimation methods (different formulas to calculate the slope and intercept)
different populations, different samples, different variables
What statistical instruments can be used to make a comparison?
total sum of squares
R^2
standard error
What equation do you use to work out total variation?
SST = SSR + SSE
Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares
What does SST stand for?
Total Sum of Squares
What does SSR stand for?
Regression Sum of Squares
What does SSE stand for?
Error Sum of Squares
How do you work out SST (Total Sum of Squares)?
SST = ∑(Yi - ȳ)^2
How do you work out SSR (Regression Sum of Squares)?
SSR = ∑(Ŷi - ȳ)^2
How do you work out SSE (Error Sum of Squares)?
SSE = ∑(Yi - Ŷi)^2
What type of variation is SST (Total Sum of Squares)?
Total Variation
Measures the variation of the Yi values around their mean ȳ
What type of variation is SSR (Regression Sum of Squares)?
Explained Variation
Variation attributable to the relationship between X and Y
What type of variation is SSE (Error Sum of Squares)?
Unexplained Variation
Variation in Y attributable to factors other than X
What is the coefficient of determination?
the portion of the total variation in the dependent variable that is explained by variation in the independent variable
What is the coefficient of determination also known as?
R-square, denoted as R^2
What is the equation for R^2 (the Coefficient of Determination)?
R^2 = SSR / SST = regression sum of squares / total sum of squares
–> R^2 = ∑(Ŷi - ȳ)^2 / ∑(Yi - ȳ)^2
What does R^2 have to be between?
0 ≤ R^2 ≤ 1
If R^2 = 1
describe the relationship between X and Y and the variation.
there is a perfect linear relationship between X and Y:
100% of the variation in Y is explained by variation in X
If R^2 = 0
describe the relationship between X and Y and the variation
no linear relationship between X and Y:
none of the variation in Y is explained by variation in X
If R^2 = 0.6
describe the relationship between X and Y and the variation.
Strong linear relationships between X and Y:
Most of the variation in Y is explained by variation in X
If R^2 = 0.4
describe the relationship between X and Y and the variation
Weaker linear relationships between X and Y:
Some but not all of the variation in Y is explained by variation in X
What is another way to work out R^2?
by working out the correlation coefficient (R) and then squaring it
If R^2 = 0.576, how could this be expressed as a proportion or percent?
57.6 percent of the variation in the Y variable is explained by the variation in the X variable
What is the equation for the Standard deviation of the variation of observations around the regression line?
(What does Syx = ?)
Syx = √(SSE / n-2) = √(∑(Yi - Ŷi)^2 / n-2)
where SSE = error sum of squares
n = sample size
What are the steps for working out regression in excel?
1) select DATA from the Title bar Menu
2) click on DATA ANALYSIS button
3) select REGRESSION from the contextual menu
4) enter Y range and X range desired options
5) get coefficient values, intercept coefficient goes before X variable 1 coefficient
Ŷi (eg sells) = Coefficient Intercept + Coefficient X Variable 1 ( X variable) eg calls
How do you get R^2 in excel?
1) select DATA from the Title bar Menu
2) click on DATA ANALYSIS button
3) select REGRESSION from the contextual menu
4) enter Y range and X range desired options
5) Look at the R square value in the Regression Statistics table
6) OR Look at ANOVA table, SS regression value = SSR and SS Total = SST
7) put values into equation R^2 = SSR / SST
How do you get the value for Syx (standard error) in excel?
1) select DATA from the Title bar Menu
2) click on DATA ANALYSIS button
3) select REGRESSION from the contextual menu
4) enter Y range and X range desired options
5) Look at regression statistics table, standard error value = Syx
How do you add the prediction line to the Plot of Fitted and observed data?
1) click on one of the observed values (Blue dots)
1) right click the mouse and select “Add Trendline” from the contextual menu