Week 10 Flashcards
What are the 5 regression assumptions?
Ass 1: For fixed values of the independent variables, means of error is 0
Ass 2: Variance and standard deviation of independent variables always constant (no fan shape shit) - only homoscedasticity
Ass 3: For any values of independent variables, the dependent variables and errors are normally distributed (bell curved)
Ass 4: Errors are probabilistically independent thus provide no information on the values of other errors - However, problematic assumptions with time series data due to autocorrelation
Ass 5: No independent variable can be EXACT linear combination of any other independent variables due to Exact Multicollinearity
What is Exact Multicollinearity?
When at least one of the independent variables is redundant and not needed
What is Multicollinearity?
When independent variables are highly correlated but not exactly which causes estimates to be unreliable and difficult to interpret
What is the inferences about the regression coefficients?
The challenge of deciding which independent variables to include since including more will increase R Square
Check Adjusted R Square (Higher) and Standard Error of Estimates (Lower)
Relevancy and data availability will also came in play
What is Parsimony Principle?
Explaining the most with the least, using the least independent variables to explain the dependent
How do we estimate parameters?
- Point of estimates -> use the statistics as the estimate parameters (different sample, different point of estimates)
- Find confidence interval with t-value
What are the two different standard error of estimate?
One is on top of the output, in the “Regression Statistics”
This measure the error when using the regression equation to predict the dependent variable
One is at the bottom of the output, in the variables tables
This measure the accuracy of each independent variable coefficient
How do we calculate confidence interval?
b +- t-multiple x Sb
Where:
b is the coefficient of intercept/independent variables
t-multiple is the number you get from t-tables, see from the “degrees of freedom - 1” and confidence seek (0.025 for 95%)
Sb is the standard of error of intercept/independent variables
What is t-values
Ratio of estimated coefficient to its standard error, above 4 / 1 is significant
What is the important hypothesis test for a regression coefficient?
You want to decide if a particular independent variable belongs in the regression equation.
If the p-value < 0.05, reject null (significant)
If the p-value > 0.05, eliminate variable
What is ANOVA Test
Analysis of Variance, explain variation in dependent variables
The larger the F-ratio is, the better since it’s more explainable
If ANOVA (Significant F) < 0.05, reject null (significant), more than 1 variable can explain the changes in dependent variable
What is multicollinearity?
When there is a fairly strong linear relationship among a set of explanatory variables -> highly correlated, prolematic
Usual signs:
“wrong” signs show that they are not significant
t-values too small
p-values too large
impossible to predict and causes unreliable estimate
How to test for multicollinearity?
VIF and R Square
VIF (Variance Inflation Factor) -> 1/(1-R Square)
If:
VIF = 1 -> No Correlation
VIF > 4 -> Some Problem
VIF > 10 -> Serious multicollinearity problem
How to decide to include/exclude variables
- t-values and p-values associated, if p-values > 0.05, exclude
- Check t-values < 1, exclude
- low t-values & high p-values, exclude
- economic or physical theory (common sense)
When can we predict revenues?
When the R Square is large and Se (Standard of Error) is small