2. Multiple Regression And Issues In Regression Analysis Flashcards
For a t-statistic in a hypothesis testing of regression coefficients, which is the number of degrees of freedom?
Degrees of freedom = n-k-1
What is the p-value and how to interpret it? If the p-value is less than significance level…
The p-value is the smallest level of significance for which the null hypothesis can be rejected.
- if the p-value is less than the significance level, the null hypothesis can be rejected
- if the p-value is greater than the significance level, the null hypothesis cannot be rejected
Explain the assumptions of a multiple regression model.
As with simple linear regression, most of the assumptions made with the multiple regression pertain to error term.
- A linear relationship exists between the dependent and independent variables
- The independent variables are not random, and there is no exact linear relation between any two or more independent variables
- The expected value of the error term, conditional on the independent variable, is zero
- The variance of the error terms is constant for all observations
- The error term for one observation is not correlated with that of another observation
- The error term is normally distributed
Explain the F-statistic (a F-test assesses what?) and show the formula.
Is a two-tailed or one-tailed test? Which are the degrees of freedom?
An F-test assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. The F-statistic is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable.
F = MSR/MSE = (RSS/k)/[SSE/(n-k-1)]
ONE TAILED!!!
df(numerator) = k
df(denominatot) = n-k-1
Beside F-test, another way to test all of the coefficients simultaneously is to just conduct all of the individual t-test and see how many of them you can reject.
True or false?
False!
This is the wrong approach, however, because if you set the significance level for each t-test at 5%, for example, the significance level from testing them all simultaneously is NOT 5%, but rather some higher percentage.
Just remember to use the F-test on the exam if you are asked to test all of the coefficients simultaneously.
Why R^2 by itself may not be a reliable measure of the explanatory power of the multiple regression model? How to calculate the adjusted R^2?
This is because R^2 almost always increases as variables are added to the model, even if the marginal contribution of the new variables is not statistically significant.
Adjusted R^2 = 1 - [(n-1)/(n-k-1)] * (1-R^2)]
Which are the three broad categories of model misspecification and its subcategories?
1) The functional form can be misspecified
a. important variables are omitted
b. variables should be transformed
c. data is improperly pooled
2) Explanatory variables are correlated with the error term in time series model
a. a lagged dependent variable is used as an independent variable
b. a function of the dependent variable is used as an independent variable (‘forecasting the past’). Pegar market cap do final do período para prever P/E final do período
c. independent variables are measured with error
3) Other time-series misspecifications that result in nonstationarity.