Multiple Regression Flashcards
We can use multiple regression models to:
1 - Identify relationships between variables
2 - Forecast variables
3 - Test existing theories
The general multiple linear regression model is:
Yi = b0 + b1X1i + b2X2i + … + bkXki + εi
The residual, ε is
the difference between the observed value, Yi, and the predicted value from the regression, Y
The p-value is
the smallest level of significance for which the null hypothesis can be rejected
If the p-value is less than the significance level
the null hypothesis can be rejected
If the p-value is greater than the significance level
the null hypothesis cannot be rejected.
intercept term
is the value of the dependent variable when the independent variables are all equal to zero.
Assumptions underlying a multiple regression model include
- linear relationship exists between x + y.
- residuals are normally distributed.
- variance of the error terms is constant.
- residual are not correlated.
- variables are not random, and no exact linear relation between variables.
R2
evaluates the overall effectiveness of the entire set of independent variables in explaining the dependent variable
R2 = total variation-unexplained variation / total variation
R2 = SST-SSE / SST
R2 = explained variation / total variation
R2 = RSS / SST
Adjusted R2
R2a =1- [(n-1/n-k-1)×(1-R2)]
AIC
better forecast
BIC
goodness of fit
nested models
one model, called the full model or unrestricted model, has a higher number of independent variables
Restricted model
subset of the independent variables
F-statistic
F= (SSER-SSEU)/q / (SSEU)/(n-k-1)
=(RSSU)/k / (SSEU)/(n-k-1)
reject H0 if
F (test-statistic) > Fc (critical value)
F-test evaluates whether
the relative decrease in SSE due to the inclusion of q additional variables is statistically justified.
Regression model specification
selection of the explanatory (independent) variables to be included in a model
Examples of Misspecification of Functional Form
Misspecification #1: Omitting a Variable
Misspecification #2: Variable Should Be Transformed
Misspecification #3: Inappropriate Scaling of the Variable
Misspecification #4: Incorrectly Pooling Data
Omission of important independent variable(s) effect
Biased and inconsistent regression parameters
serial correlation or heteroskedasticity
Inappropriate variable form effect
heteroskedasticity
Inappropriate variable scaling effect
heteroskedasticity or multicollinearity
Data improperly pooled effect
heteroskedasticity or serial correlation