01.02 Evaluating regression model fit and interpreting model results Flashcards
What is ANOVA tables
Statistical procedure that decomposes the total variation in the dependent variable into the explained and unexplained components to evaluate
Regression is explained component of the variation, while error is the unexplained component
What is the Coefficient of determination, R2
R2 evaluates the overall effectiveness of the entire set of independent variables in explaining the dependent variable
% of the variation in the dependent variable that is collectively explained by all the independent variable
What are the issues with R2?
R2 always increases as more independent variables are added to the model, even when the marginal contribution of new variables are not statistically significant
High R2 may reflect the impact of a large set of independent variables, rather than how efficiently the set explains the dependent variable - may result in overestimating or overfitting the regression model
What is R2a
tool used to overcome the issues with R2
R2a may increase or decrease as more independent variables are added – resolves the overfitting issue
If a new variable only has a small effect on R2, the value of R2a will decrease
Not indicative of the quality of the model fit, nor does it indicate the statistical significance of the slope coefficients
What is AIC?
Akaike’s information criterion (AIC)
AIC used if the goal is to have a better forecast
lower the value the better
What is BIC?
Schwarz’s Bayesian information criteria (BIC)
BIC is used if the goal is a better goodness of fit
Lower the value, the better the model
What are F-tests?
Formal F-test can be used to evaluate nested model
Nested models are such that one model has a higher number of independent variables (full model/ unrestricted model) while another model (restricted model) has only a subset of the independent variable
What is the F-statistic used to evaluate H0=b2=b3=0 and Ha:B2 or B3 not equal 0
F-tests evaluate whether the relative decrease in SSE due to the inclusion of Q additional variables is statistically justified
Individual slope coefficients can be tested using t-tests by F-tests provides more meaningful evaluation of the explanatory power of alternative models
F test evaluates the overall model fit – tests whether at least one of the independent variables explains a significant portion of a variation of the dependent variables
What is the F-statistic used to evaluate 3 variables H0=b1=b2=b3=0 and Ha:BJ not equal 0
- F test evaluates the overall model fit – tests whether at least one of the independent variables explains a significant portion of a variation of the dependent variables
What is predicting the dependent variable?
Regression equation used to make predictions about the dependent variable based on forecasted values of the independent variables
Requires more than one independent variable
Uses estimated intercept and estimated slope coefficients regardless of whether the estimated coefficients are statistically significant different from Zero