Reading 12 - Multiple Regression and Issues in Regression Analysis Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Describe what Multiple Regression is ….

A

Is regression analysis with more than one independent variable. It is used to quantify the influence of two or more independent variables on a dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the Residual represent in a regression equation?

A

Is the difference between the observed value Yiand the predicted value from the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the intercept term in a multiple regression equation?

A

Is the value of the dependent variable when the independent variables are all equal to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the formula to calculate the t-statistic in a multiple regression equation?

A

It is ued to test the significance of the individual coefficients in a multiple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many degress of freedom does the t-statistic have in a multiple regression?

A

n-k-1

k= the # of regression coefficients in the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are we doing when we test for stastical significance?

A

It means testing the null hypothesis that the coefficient is zero vs the alternative that it is not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the p-value?

A

The smallest level of significance for which the null hypothesis can be rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What determines whether we can or cannot reject the hypothesis when comparing p-values to the significance level?

A
  • if the p-value is less than the significance level, the null hypothesis can be rejected
  • If the p-value is greater than the significance level, the null hypothesis cannot be rejected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Using this data, test the null hypothesis that PR is equal to 0.20 vs the alternative that it is not equal to 0.20, using a 5% significance level.

The t-value is 2.02 from the back of the book.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a confidence interval look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a 10% significance level mean?

A

Is the same thing as a 90% confidence level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the assumptions made with the multiple regressions regarding the model’s standard error?

A
  • a linear relationship exists between the independent and independent variables
  • the independent variables are not random, and there is no exact linear relation between any two or more independent variables
  • the expected value of the error term, conditional on the independent variable, is 0
  • the variance of error terms is constant for all observations
  • the error term is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the F-statistic?

A

Is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Is an f- statistic a one tailed or two tailed test?

A

Always one - tailed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the f - statistic calculated?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the decision rule for an f - test?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

An analyst runs a regression of monthly value-stock returns on five independent variables over 60 months. The total sum of squares is 460, and the sum of squared errors is 170.

Test the null hypothesis at the 5% significance level that all five of the independent variables are equal to zero.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does an R2 of 0.63 indicate about a model?

A

that the model, as a whole, explains 63% of the variation in the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is the R2 calcuated?

A
20
Q

Why is the R2 not considered a reliable measure of the explanatory power of the regression model?

A

because the R2 almost always increases as variables are added to the model, even if the marginal contribution of the new variables is not statistically significant

21
Q

To overcome the deficits of the R2, statisticians have created an adjusted R2. What is the formula for this equation?

A
22
Q

What are dummy variables?

A

occassions when the independent variables are binary —–it is either “on” or “off”

Dummy variables are assigned a 0 or 1 only

23
Q

What are the three primary assumption violations that you encounter doing multiple regressions?

A
  1. Heteroskedasticity
  2. serial correlation (ie autocorrelation)
  3. multicollinearity
24
Q

What is heteroskedasticity?

A

occurs when the variance of the residuals is not the same across all observations in the sample

This happens when there are subsamples that are more spread out than the rest of the sample.

25
Q

What is unconditional heteroskedasticity?

A

occurs when the heteroskedasticity is not related to the level of the independent variables.

***Does not usually cause major problems with the regression

26
Q

What is conditional heteroskedasticity ?

A

The heteroskedasticity that is related to the level of the independent variables.

**ie… it exists if the variance of the residual term increases as the value of the independent variable increases

***Does create significant problems for statistical inference

27
Q

What are the 4 effects of heteroskedasticity on regression analysis?

A
  1. The standard errors are usually unreliable estimates
  2. The coefficient estimates (the bj) aren’t affected
  3. If the standard errors are too small, but the coefficient estimates themselves are not affected, the t-statistics will be too large and the null hypothesis of no statistica significance is rejected too often
  4. The F-test is also unreliable
28
Q

What are the 2 methods for detecting heteroskedasticity?

A
  1. Examining scatter plots of the residual
  2. Using the Breusch-Pagan chi-square (x2) test
29
Q

Explain how to test for heteroskedasticity using the Breusch-Pagan chi-square test……

A
  • It calls for the regression of the squared residuals on the independent variables
  • the test statistic is the chi-square

chi square test = n * R2resid with k degrees of freedom

k = # of independent variables

***This is a one tailed test b/c heteroskedasticity is only a problem if the R2 and the BP test statistic are too large.

30
Q

How do you correct for heteroskedasticity?

A
  • **Calculate a robust standard errors **(aka White-corrected standard errors)
  • creates a test statistic of the coefficient / standard error

IF the test statistic is less than the critical t-value, it means the null cannot be rejected

31
Q

What is serial correlation (aka autocorrelation) ?

A

The situation in which the residual terms are correlated to one another

Positive serial correlation : exists when a positive regression error in one time period increases the probabilty of observing a positive regression error for the next time period

Negative serial correlation : occurs when a positive error in one period increases the probability of observing a negative error in the next period

32
Q

What is the effect of Serial Correlation on Regression Analysis?

A

Positive serial correlation typically results in coefficient standard errors that are too small. This causes the t-statistics to be too low which causes too may Type I errors

** The F-test will also be unreliable b/c the MSE will be underestimated leading to too many Type I errors

33
Q

What are the 2 methods for detecting serial correlation?

A
  1. Scatter plot of residuals
  2. Durbin-Watson statistic
34
Q

How is the Durbin-Watson statistic (DW) calculated for use in detecting the presence of serial correlation?

A
35
Q

What are the two ways to correct for serial correlation?

A
  1. Adjust the coefficient standard errors, using the Hansen method.
    • The Hansen method also corrects for conditional heteroskedasticity.
  2. Improve the specification of the model
36
Q

What is multicollinearity ?

A

The condition when two or more of the independent variables in a multiple regression are highly correlated with each other

37
Q

What is multicollinearity’s effect on regression analysis?

A
  • Slope coefficients tend to be unreliable. Additionally, standard errors of the slope coefficients are artificially inflated
  • There is a greater probability that we will incorrectly conclude that a variable is not statistically significant (Type II error)
38
Q

What is the general rule to determine if multicollinearity is a potential problem?

A

If the absolute value of the sample correlation between any two independent variables in the regression is greater than 0.70

39
Q

Based on the below data, is multicollinearity a problem in this regression ?

A

This is a clear indication of multicollinearity because non of the independent variables are statistically significant since they are larger than 10%

40
Q

What is the most common way to detect multicollinearity?

A

Are situations where the t-tests indicate that none of the individual coefficients is significantly different from zero, while the F-test is statistically significant and the R2 is high

41
Q

What is the most common way to correct for multicollinearity?

A

To omit one of more of the correlated independent variables

42
Q

What are the broad categories in which the regression model can be specified incorrectly?

A
  • The functional form can be misspecified
  • Explanator variables are correlated with the rror term in time series models
  • Other time-series misspecifications that result in nonstationarity
43
Q

What does **model misspecification **mean in regards to multiple regressions?

A

Ways in which the regression model can be specified incorrectly

44
Q

What can go wrong if there are model misspecifications?

A
  1. Regression coefficients are often biased on/or inconsistent
  2. Have no confidence in hypothesis tests or in the predictions of the model
45
Q

What are **probit ** and **logit **models?

A
  • Probit - based on the normal distribution
  • Logit - based on the logistic distribution
46
Q

What does it mean to transform a variable?

A

Certain variables are nonlinear, to make them linear you need to take the log of them