5. Multiple Regression Flashcards

1
Q

Heteroskedacity

A

The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 Violations of Regression Assumptions

A

Heteroskedasticity

Serial Correlation

Multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

6 Assumptions of Classical Normal Multiple Linear Regression

A
  1. The relationship between the dependent variable, Y, and the independent variables, X1, X2, …, Xk, is linear.
  2. The independent variables (X1, X2, …, Xk) are not random. Also, no exact linear relation exists between two or more of the independent variables.
  3. The expected value of the error term, conditioned on the independent variables, is 0: E( ε | X1, X2, …, Xk) = 0.
  4. The variance of the error term is the same for all observations.
  5. The error term is uncorrelated across observations: E( εiεj) = 0, j ≠ i.
  6. The error term is normally distributed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

F-statistic

A

To test the null hypothesis that all of the slope coefficients in the multiple regression model are jointly equal to 0 (H0: b1 = b2 = … = bk = 0) against the alternative hypothesis that at least one slope coefficient is not equal to 0 we must use an F-test. The F-test is viewed as a test of the regression’s overall significance. The F-test for determining whether the slope coefficients equal 0 is based on an F-statistic calculated using the four values listed above. The F-statistic measures how well the regression equation explains the variation in the dependent variable; it is the ratio of the mean regression sum of squares to the mean squared error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Adjusted R^2

A

A measure of goodness-of-fit of a regression that is adjusted for degrees of freedom and hence does not automatically increase when another independent variable is added to a regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dummy Variables

A

A type of qualitative variable that takes on a value of 1 if a particular condition is true and 0 if that condition is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Serial Correlation

A

With reference to regression errors, errors that are correlated across observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multicollinearity

A

A regression assumption violation that occurs when two or more independent variables (or combinations of independent variables) are highly but not perfectly correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Serial Correlation Remedies

A

We have two alternative remedial steps when a regression has significant serial correlation. First, we can adjust the coefficient standard errors for the linear regression parameter estimates to account for the serial correlation. Second, we can modify the regression equation itself to eliminate the serial correlation. We recommend using the first method for dealing with serial correlation; the second method may result in inconsistent parameter estimates unless implemented with extreme care.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multicollinearity Effects

A

OLS estimates of the regression coefficients can be consistent, but the estimates become extremely imprecise and unreliable.

It becomes practically impossible to distinguish the individual impacts of the independent variables on the dependent variable.

Inflated OLS standard errors for the regression coefficients. With inflated standard errors, t-tests on the coefficients have little power (ability to reject the null hypothesis).

The analyst should be aware that using the magnitude of pairwise correlations among the independent variables to assess multicollinearity, as has occasionally been suggested, is generally not adequate. Although very high pairwise correlations among independent variables can indicate multicollinearity, it is not necessary for such pairwise correlations to be high for there to be a problem of multicollinearity. High pairwise correlations among the independent variables are not a necessary condition for multicollinearity, and low pairwise correlations do not mean that multicollinearity is not a problem.

The only case in which correlation between independent variables may be a reasonable indicator of multicollinearity occurs in a regression with exactly two independent variables.

The classic symptom of multicollinearity is a high R2 (and significant F-statistic) even though the t-statistics on the estimated slope coefficients are not significant. The insignificant t-statistics reflect inflated standard errors. Although the coefficients might be estimated with great imprecision, as reflected in low t-statistics, the independent variables as a group may do a good job of explaining the dependent variable, and a high R2 would reflect this effectiveness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Serial Correlation Effects

A

an incorrect estimate of the regression coefficient standard errors computed by statistical software packages. As long as none of the independent variables is a lagged value of the dependent variable (a value of the dependent variable from a previous period), then the estimated parameters themselves will be consistent and need not be adjusted for the effects of serial correlation. If, however, one of the independent variables is a lagged value of the dependent variable— for example, if the T-bill return from the previous month was an independent variable in the Fisher effect regression— then serial correlation in the error term will cause all the parameter estimates from linear regression to be inconsistent and they will not be valid estimates of the true parameters. Although positive serial correlation does not affect the consistency of the estimated regression coefficients, it does affect our ability to conduct valid statistical tests. First, the F-statistic to test for overall significance of the regression may be inflated because the mean squared error (MSE) will tend to underestimate the population error variance. Second, positive serial correlation typically causes the ordinary least squares (OLS) standard errors for the regression coefficients to underestimate the true standard errors. As a consequence, if positive serial correlation is present in the regression, standard linear regression analysis will typically lead us to compute artificially small standard errors for the regression coefficient. These small standard errors will cause the estimated t-statistics to be inflated, suggesting significance where perhaps there is none. The inflated t-statistics may, in turn, lead us to incorrectly reject null hypotheses about population values of the parameters of the regression model more often than we would if the standard errors were correctly estimated. This Type I error could lead to improper investment recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Heteroskedasticity Consequences

A

Although heteroskedasticity does not affect the consistency31 of the regression parameter estimators, it can lead to mistakes in inference. When errors are heteroskedastic, the F-test for the overall significance of the regression is unreliable. 32 Furthermore, t-tests for the significance of individual regression coefficients are unreliable because heteroskedasticity introduces bias into estimators of the standard error of regression coefficients. If a regression shows significant heteroskedasticity, the standard errors and test statistics computed by regression programs will be incorrect unless they are adjusted for heteroskedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heteroskedasticity Remedies

A

We can use two different methods to correct the effects of conditional heteroskedasticity in linear regression models. The first method, computing robust standard errors, corrects the standard errors of the linear regression model’s estimated coefficients to account for the conditional heteroskedasticity. The second method, generalized least squares, modifies the original equation in an attempt to eliminate the heteroskedasticity. The new, modified regression equation is then estimated under the assumption that heteroskedasticity is no longer a problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unconditional Heteroskedasticity

A

Heteroskedasticity of the error term that is not correlated with the values of the independent variable( s) in the regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Conditional Heteroskedasticity

A

Heteroskedasticity in the error variance that is correlated with the values of the independent variable( s) in the regression. Most problematic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Robust standard errors

A

Standard errors of the estimated parameters of a regression that correct for the presence of heteroskedasticity in the regression’s error term.

17
Q

Generalized least squares

A

A regression estimation technique that addresses heteroskedasticity of the error term.

18
Q

Positive Serial Correlation

A

Serial correlation in which a positive error for one observation increases the chance of a positive error for another observation, and a negative error for one observation increases the chance of a negative error for another observation.

19
Q

First-Order Serial Correlation

A

Correlation between adjacent observations in a time series.

20
Q

Testing for Serial Correlation

A

We can choose from a variety of tests for serial correlation in a regression model, 47 but the most common is based on a statistic developed by Durbin and Watson (1951); in fact, many statistical software packages compute the Durbin– Watson statistic automatically.

21
Q

Testing for Heteroskedasticity

A

Breusch– Pagan test

22
Q

Breusch– Pagan test

A

Regress the squared residuals from the estimated regression equation on the independent variables in the regression. If no conditional heteroskedasticity exists, the independent variables will not explain much of the variation in the squared residuals. If conditional heteroskedasticity is present in the original regression, however, the independent variables will explain a significant portion of the variation in the squared residuals. The independent variables can explain the variation because each observation’s squared residual will be correlated with the independent variables if the independent variables affect the variance of the errors.

23
Q

Principles of Model Specification

A

The model should be grounded in cogent economic reasoning.

The functional form chosen for the variables in the regression should be appropriate given the nature of the variables.

The model should be parsimonious. In this context, “parsimonious” means accomplishing a lot with a little. We should expect each variable included in a regression to play an essential role.

The model should be examined for violations of regression assumptions before being accepted.

The model should be tested and be found useful out of sample before being accepted.

24
Q

Misspecified Functional Form

A

One or more important variables could be omitted from regression. One or more of the regression variables may need to be transformed (for example, by taking the natural logarithm of the variable) before estimating the regression. The regression model pools data from different samples that should not be pooled.

25
Q

Time-Series Misspecification (Independent Variables Correlated with Errors)

A

including lagged dependent variables as independent variables in regressions with serially correlated errors;

including a function of a dependent variable as an independent variable, sometimes as a result of the incorrect dating of variables; and

independent variables that are measured with error.

26
Q

Other Types of Time-Series Misspecification

A

Relations among time series with trends (for example, the relation between consumption and GDP).

Relations among time series that may be random walks (time series for which the best predictor of next period’s value is this period’s value). Exchange rates are often random walks.

27
Q

Qualitative dependent variables

A

Dummy variables used as dependent variables rather than as independent variables.

28
Q

Logistic Regression

A

Logistic regression involves modeling a dependent variable that is the natural logarithm of a ratio of probabilities— the probability that the event of interest happens divided by the probability that it does not happen.

29
Q

Discriminant Analysis

A

A multivariate classification technique used to discriminate between groups, such as companies that either will or will not become bankrupt during some time frame.

30
Q

Probit regression (probit model)

A

A qualitative-dependent-variable multiple regression model based on the normal distribution.

31
Q

Logistic regression (logit model)

A

A qualitative-dependent-variable multiple regression model based on the logistic probability distribution.

32
Q

Testing whether All Population Regression Coefficients Equal Zero (F-Test)

A

To correctly calculate the test statistic for the null hypothesis, we need four inputs: total number of observations, n; total number of regression coefficients to be estimated, k + 1, where k is the number of slope coefficients; sum of squared errors or residuals, , abbreviated SSE, also known as the residual sum of squares (unexplained variation); and regression sum of squares, , abbreviated RSS. This amount is the variation in Y from its mean that the regression equation explains (explained variation).

33
Q

Steps to Predicting Dependent Variable

A
34
Q

F-Test

A
35
Q

R^2 vs. adj R^2

A