Regression Flashcards

1
Q

If the error term in a linear regression model is not normally distributed…
- The OLS estimator is biased
- Routinely calculated standard errors are incorrect
- We need to rely on asymptotic theory to perform valid tests
- We need to take the log of the dependent variable

A

If the error term in a linear regression model is not normally distributed…

  • We need to rely on asymptotic theory to perform valid tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a linear regression model, if the slope coefficient of x has a t-statistic of 3.0
- We accept the hypothesis that x has an impact
- We accept the hypothesis that x is significant
- We reject the null hypothesis that x is insignificant
- We reject the null hypothesis that x has no impact

A

In a linear regression model, if the slope coefficient of x has a t-statistic of 3.0

  • We reject the null hypothesis that x has no impact

Never accept a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which problem does make the OLS estimator biased?
- Simultaneity between x and y
- Heteroskedasticity
- A small sample
- All of these

A

Which problem does make the OLS estimator biased?
- Simultaneity between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which statement is correct?
- R2 is the most important statistic of a regression model
- R2 tells us how well the model fits the data
- A larger R2f is always better
- If R2=0 we have a useless model

A
  • R2 tells us how well the model fits the data
    -If R2=0 we have a useless model

An R2 of 0 means that the regression line is flat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What increases the precision of the OLS estimator?
- Having more observations
- Having more variation in x
- Having less correlation between x and other regressors
- Having a smaller error variance

A

What increases the precision of the OLS estimator?
- Having more observations
- Having more variation in x
- Having less correlation between x and other regressors
- Having a smaller error variance

Answer: All correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assume an estimated slope coefficient for x2 of 0.35, with a standard error of 0.15. Which statement is correct, assuming a significance level of 95%?
- The most likely value for the true slope coefficient is 0.35
- The estimated coefficient differs significantly from 0
- The estimated coefficient does not differ significantly from 0
- x2 differs significantly from zero

A

Assume an estimated slope coefficient for x2 of 0.35, with a standard error of 0.15. Which statement is correct, assuming a significance level of 95%?
- The estimated coefficient differs significantly from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the model explaining log house prices, we estimate a coefficient of 0.08 for the number of
bedrooms. What does this mean? Other things equal,
- one more bedroom increases the expected house price by 0.08%
- a house with one more bedroom is selling at an 8% higher price
- one more bedroom increases the expected house price by 8%
- one more bedroom increases the expected house price by 0.08 times the average price

A

In the model explaining log house prices, we estimate a coefficient of 0.08 for the number of
bedrooms. What does this mean? Other things equal,
- one more bedroom increases the expected house price by 8%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which assumption is NOT essential for routinely calculated standard errors to be correct?
- The error terms are homoscedastic
- The error terms are serially uncorrelated
- The error terms are normally distributed
- All three assumptions are essential

A

Which assumption is NOT essential for routinely calculated standard errors to be correct?
- The error terms are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

We estimate 𝑦 = 0 + 0.5𝑥 + 0.1𝑑 − 0.3𝑥 × 𝑑. Which interpretation is correct?
- For firms with d=1, the impact of x on y is smaller than for firms with d=0
- For firms with d=1, the impact of x on y is negative
- Firms with d=1 have higher expected values of y
- For firms with d=1, the impact of x on y is larger than for firms with d=0

A

We estimate 𝑦 = 0 + 0.5𝑥 + 0.1𝑑 − 0.3𝑥 × 𝑑. Which interpretation is correct?
- For firms with d=1, the impact of x on y is smaller than for firms with d=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assume we wish to estimate the impact of x on y, separately for firms with d=1 and d=0. How do we do this in one regression?
- Regress y x d
- Regress y x xd
- Regress y x d x
d
- Regress y d x*d

A

Assume we wish to estimate the impact of x on y, separately for firms with d=1 and d=0. How do we do this in one regression?

  • Regress y x d x*d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When estimating a panel model with firm fixed effects…
- We obtain more precise estimates of the slope coefficients
- We cannot include firm-invariant explanatory variables
- We cannot include time-invariant explanatory variables
- We cannot use standard errors clustered by firm

A

When estimating a panel model with firm fixed effects…

We cannot include time-invariant explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is (are) the main reason(s) to include firm fixed effects in a panel regression?
- Improving precision of the estimation of the slope coefficients
- Obtaining appropriate standard errors for the slope coefficients
- Controlling for time-invariant firm-specific factors
- Reducing bias in the estimation of the slope coefficients

A

What is (are) the main reason(s) to include firm fixed effects in a panel regression?
- Controlling for time-invariant firm-specific factors
- Reducing bias in the estimation of the slope coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Consider a linear probability model, explaining failing (y=1) the MSc. The coefficient for female is
-0.03. What does this mean?
- Female students are 0.03% less likely to fail
- Male students are 3% more likely to pass
- Female students are 3% more likely to pass

A

Consider a linear probability model, explaining failing (y=1) the MSc. The coefficient for female is
-0.03. What does this mean?
- Female students are 3% more likely to pass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Consider a logit model, explaining failing (y=1) the MSc. The coefficient for female is -0.03. What
does this mean?
- Female students are more likely to pass
- Male students are more likely to pass
- Female students are 3% more likely to pass
- Don’t know. Need to calculate marginal effects

A

Consider a logit model, explaining failing (y=1) the MSc. The coefficient for female is -0.03. What
does this mean?
- Female students are more likely to pass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Consider a probit model, explaining failing (y=1) the MSc. The average marginal effect for female is
-0.03. What does this mean?
- Females are 3% less likely to fail
- Males are 3% less likely to fail
- Females are 0.03% less likely to fail
- Don’t know. Depends upon the coefficient

A

Consider a probit model, explaining failing (y=1) the MSc. The average marginal effect for female is
-0.03. What does this mean?
- Females are 3% less likely to fail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following is NOT a symptom of multicollinearity?
Low F-statistic
Large but insignificant beta coefficients
High variance inflation factors
Low t-statistics for the independent variables

A

Which of the following is NOT a symptom of multicollinearity?
- Low F-statistic

17
Q

Assume we estimate a simple regression model explain Y from X i.e. Y=A+BX+u.

What will more variation in X lead to?

A

Other things equal, more variation in X will improve the precision of the OLS estimator for B

18
Q

Assume we estimate a linear model explaining the book-to-market ratio of a firm using a panel of
firms. We estimate the model with pooled OLS. As an alternative, we estimate the model including
firm fixed effects. The estimated slopes for the two methods appear to be significantly different.
What does this suggest?

A

The inclusion of firm fixed effects reduces omitted variable bias.

19
Q

When estimating a standard regression model, we need to make a number of assumptions to be able to argue that the OLS estimator is unbiased or consistent. Which assumption is necessary for the OLS estimator to be unbiased?

A

The disturbance terms in the model are uncorrelated with the regressors.

For OLS to be unbiased, the error terms must not be correlated with the independent variables. This is because if errors are correlated, it indicates that there is some systematic influence on the errors from the variables we are using to predict the outcome, which should not happen. For accurate predictions, we want our error terms to be random, not influenced by the predictors; otherwise, our estimates of the relationship between the predictors and the outcome will be off.

Other explanation:
By ensuring that the error term is uncorrelated with the regressors, we are essentially ensuring that the variation in the independent variables that we are using to predict the dependent variable is not systematically related to factors that we have not included in the model (captured in the error term). This is what allows us to argue that the OLS estimator is unbiased. If this assumption does not hold, the estimated coefficients may systematically differ from the true population parameters, leading to biased estimations.

20
Q

It is possible that the disturbance term in a regression model is not a normally distributed. What is
NOT a potential cause for nonnormality?

Outliers
The sample is too small
Skewness of the distribution
Fat tails in the distribution (excess kurtosis)

A

The sample is too small

21
Q

Consider a linear regression model where Y is explained from X. It is suspected that the true
relationship between Y and X could be nonlinear. What is NOT appropriate to test for this?
Perform a RESET test
Checking the variance inflation factors
Add Ln X and test its significance
Add X2 and test its significance

A

Checking the variance inflation factors

  • Perform a RESET test: The Regression Equation Specification Error Test (RESET) is specifically designed to test for misspecification in the model, including nonlinearity. It is an appropriate test for this purpose.
  • Checking the variance inflation factors: Variance Inflation Factors (VIFs) are used to check for multicollinearity among independent variables, not for nonlinearity. VIFs measure how much the variance of an estimated regression coefficient increases if your predictors are correlated. If you suspect nonlinearity, VIFs are not the correct tool to use.
  • Add Ln X and test its significance: Adding the natural logarithm of X (Ln X) and testing its significance is a way to see if a logarithmic transformation of the variable improves the model. This can address nonlinearity.
  • Add X^2 and test its significance: Adding the square of X (X^2) to the model and testing its significance is a common method for testing for a quadratic relationship, which is a form of nonlinearity.
22
Q

In a linear regression model, the OLS estimator is unbiased under restrictive set of assumptions.
Which is a potential cause of the OLS estimator to be biased?

A

Bias in an OLS estimator can arise if a crucial variable is left out of the model.

In linear regression, the OLS (Ordinary Least Squares) estimator is used to find the line of best fit by minimizing the sum of squared residuals. For the OLS estimator to be unbiased, one key assumption is that all relevant explanatory variables are included in the regression. If an important variable that affects the dependent variable is omitted and is correlated with any of the included independent variables, it can cause omitted variable bias. This bias occurs because the omitted variable’s influence on the dependent variable is wrongly attributed to the included variables, leading to incorrect estimates of the coefficients and thus a biased estimator.

23
Q

Consider a linear model that includes an intercept term. Which of the following statement(s) about
the OLS residuals is (are) correct?
1. The average residual is by construction equal to 0
2. The residuals are uncorrelated with the fitted values of Y

A

Both statement 1 and 2 are correct

24
Q

When estimating a standard regression model, we need to make several assumptions. Which
assumption is not necessary for the routinely calculated standard errors to be correct?
The disturbance terms in the model are normally distributed
The disturbance terms in the model are serially uncorrelated
The disturbance terms in the model are homoscedastic
The disturbance terms in the model have zero mean

A

When estimating a standard regression model, we need to make several assumptions. Which
assumption is not necessary for the routinely calculated standard errors to be correct?

The disturbance terms in the model are normally distributed

25
Q

In many applications in finance the disturbance terms in a regression model are likely to suffer from heteroskedasticity. Which statement is NOT correct?

Under hetero standard t-tests will be misleading
Under hetero the OLS estimator is biased
The use of hetero consistent standard error adjusts for arbitrary forms of hetero
Transforming relvant variables into logs may reduce or eliminate the problem of hetero

A

Under hetero the OLS estimator is biased

Heteroskedasticity does not cause the OLS estimators to be biased; it remains unbiased. However, heteroskedasticity does affect the efficiency of the estimators, meaning that they are no longer the Best Linear Unbiased Estimators (BLUE), and it affects the standard errors, leading to incorrect conclusions about the significance of the coefficients.

26
Q

Factors reducing accuracy of OLS?

A

-** Large error variance s^2 **
* large influence of other factors not in the model

**- small number of observations **
* this means less info available to estimate parameters

-** little spread in indepedent variables**
* Without variation in x, one cannot explain variation in y
* Too much variation duo to outliers is bad tho

27
Q

Purpose of the F-test in Regression Analysis
Question: What is the purpose of the F-test in regression analysis?

A

The F-test in regression analysis is used to determine whether the model as a whole is statistically significant. It tests the null hypothesis that all regression coefficients are equal to zero, which implies that the predictor variables, as a group, do not have a statistically significant relationship with the dependent variable. A significant F-test indicates that the observed relationships are unlikely to be due to random chance.

28
Q

How do you interpret the results of the F-test in regression analysis?

A

Answer: In regression analysis, the F-test yields an F-statistic and a corresponding p-value. The F-statistic is calculated based on the ratio of the model’s mean squared error to the residual mean squared error. To interpret the results:

A high F-statistic value (much larger than 1) suggests that the variation explained by the model is significantly greater than the unexplained variation.
The p-value helps to determine the statistical significance. A small p-value (typically ≤ 0.05) indicates that the model as a whole is statistically significant, meaning the relationship between the dependent and independent variables is not due to random chance.

29
Q

What is a weakness of the Fama-MacBeth approach? (What potential problem is not
accounted for?)

A

it does not allow for serial correlation (=autocorrelation) in error terms (within firms, across periods)

30
Q

In a linear regression model, the OLS estimator is unbiased under a restrictive set of assumptions. Which is a potential cause of the OLS estimator to be biased?

A

An important explanatory variable is omitted (=not included while it should)

31
Q

Suppose you would like to investigate whether firm performance depends upon the size of the board of the company, which is assumed to be exogenous. You estimate a cross-sectional regression explaining firm performance from board size (with coefficient b2) and a range of control variables. How do you formulate your null hypothesis?

A

B2 = 0

32
Q
A