Regression Flashcards
If the error term in a linear regression model is not normally distributed…
- The OLS estimator is biased
- Routinely calculated standard errors are incorrect
- We need to rely on asymptotic theory to perform valid tests
- We need to take the log of the dependent variable
If the error term in a linear regression model is not normally distributed…
- We need to rely on asymptotic theory to perform valid tests
In a linear regression model, if the slope coefficient of x has a t-statistic of 3.0
- We accept the hypothesis that x has an impact
- We accept the hypothesis that x is significant
- We reject the null hypothesis that x is insignificant
- We reject the null hypothesis that x has no impact
In a linear regression model, if the slope coefficient of x has a t-statistic of 3.0
- We reject the null hypothesis that x has no impact
Never accept a hypothesis
Which problem does make the OLS estimator biased?
- Simultaneity between x and y
- Heteroskedasticity
- A small sample
- All of these
Which problem does make the OLS estimator biased?
- Simultaneity between x and y
Which statement is correct?
- R2 is the most important statistic of a regression model
- R2 tells us how well the model fits the data
- A larger R2f is always better
- If R2=0 we have a useless model
- R2 tells us how well the model fits the data
-If R2=0 we have a useless model
An R2 of 0 means that the regression line is flat.
What increases the precision of the OLS estimator?
- Having more observations
- Having more variation in x
- Having less correlation between x and other regressors
- Having a smaller error variance
What increases the precision of the OLS estimator?
- Having more observations
- Having more variation in x
- Having less correlation between x and other regressors
- Having a smaller error variance
Answer: All correct
Assume an estimated slope coefficient for x2 of 0.35, with a standard error of 0.15. Which statement is correct, assuming a significance level of 95%?
- The most likely value for the true slope coefficient is 0.35
- The estimated coefficient differs significantly from 0
- The estimated coefficient does not differ significantly from 0
- x2 differs significantly from zero
Assume an estimated slope coefficient for x2 of 0.35, with a standard error of 0.15. Which statement is correct, assuming a significance level of 95%?
- The estimated coefficient differs significantly from 0
In the model explaining log house prices, we estimate a coefficient of 0.08 for the number of
bedrooms. What does this mean? Other things equal,
- one more bedroom increases the expected house price by 0.08%
- a house with one more bedroom is selling at an 8% higher price
- one more bedroom increases the expected house price by 8%
- one more bedroom increases the expected house price by 0.08 times the average price
In the model explaining log house prices, we estimate a coefficient of 0.08 for the number of
bedrooms. What does this mean? Other things equal,
- one more bedroom increases the expected house price by 8%
Which assumption is NOT essential for routinely calculated standard errors to be correct?
- The error terms are homoscedastic
- The error terms are serially uncorrelated
- The error terms are normally distributed
- All three assumptions are essential
Which assumption is NOT essential for routinely calculated standard errors to be correct?
- The error terms are normally distributed
We estimate 𝑦 = 0 + 0.5𝑥 + 0.1𝑑 − 0.3𝑥 × 𝑑. Which interpretation is correct?
- For firms with d=1, the impact of x on y is smaller than for firms with d=0
- For firms with d=1, the impact of x on y is negative
- Firms with d=1 have higher expected values of y
- For firms with d=1, the impact of x on y is larger than for firms with d=0
We estimate 𝑦 = 0 + 0.5𝑥 + 0.1𝑑 − 0.3𝑥 × 𝑑. Which interpretation is correct?
- For firms with d=1, the impact of x on y is smaller than for firms with d=0
Assume we wish to estimate the impact of x on y, separately for firms with d=1 and d=0. How do we do this in one regression?
- Regress y x d
- Regress y x xd
- Regress y x d x d
- Regress y d x*d
Assume we wish to estimate the impact of x on y, separately for firms with d=1 and d=0. How do we do this in one regression?
- Regress y x d x*d
When estimating a panel model with firm fixed effects…
- We obtain more precise estimates of the slope coefficients
- We cannot include firm-invariant explanatory variables
- We cannot include time-invariant explanatory variables
- We cannot use standard errors clustered by firm
When estimating a panel model with firm fixed effects…
We cannot include time-invariant explanatory variables
What is (are) the main reason(s) to include firm fixed effects in a panel regression?
- Improving precision of the estimation of the slope coefficients
- Obtaining appropriate standard errors for the slope coefficients
- Controlling for time-invariant firm-specific factors
- Reducing bias in the estimation of the slope coefficients
What is (are) the main reason(s) to include firm fixed effects in a panel regression?
- Controlling for time-invariant firm-specific factors
- Reducing bias in the estimation of the slope coefficients
Consider a linear probability model, explaining failing (y=1) the MSc. The coefficient for female is
-0.03. What does this mean?
- Female students are 0.03% less likely to fail
- Male students are 3% more likely to pass
- Female students are 3% more likely to pass
Consider a linear probability model, explaining failing (y=1) the MSc. The coefficient for female is
-0.03. What does this mean?
- Female students are 3% more likely to pass
Consider a logit model, explaining failing (y=1) the MSc. The coefficient for female is -0.03. What
does this mean?
- Female students are more likely to pass
- Male students are more likely to pass
- Female students are 3% more likely to pass
- Don’t know. Need to calculate marginal effects
Consider a logit model, explaining failing (y=1) the MSc. The coefficient for female is -0.03. What
does this mean?
- Female students are more likely to pass
Consider a probit model, explaining failing (y=1) the MSc. The average marginal effect for female is
-0.03. What does this mean?
- Females are 3% less likely to fail
- Males are 3% less likely to fail
- Females are 0.03% less likely to fail
- Don’t know. Depends upon the coefficient
Consider a probit model, explaining failing (y=1) the MSc. The average marginal effect for female is
-0.03. What does this mean?
- Females are 3% less likely to fail
Which of the following is NOT a symptom of multicollinearity?
Low F-statistic
Large but insignificant beta coefficients
High variance inflation factors
Low t-statistics for the independent variables
Which of the following is NOT a symptom of multicollinearity?
- Low F-statistic
Assume we estimate a simple regression model explain Y from X i.e. Y=A+BX+u.
What will more variation in X lead to?
Other things equal, more variation in X will improve the precision of the OLS estimator for B
Assume we estimate a linear model explaining the book-to-market ratio of a firm using a panel of
firms. We estimate the model with pooled OLS. As an alternative, we estimate the model including
firm fixed effects. The estimated slopes for the two methods appear to be significantly different.
What does this suggest?
The inclusion of firm fixed effects reduces omitted variable bias.
When estimating a standard regression model, we need to make a number of assumptions to be able to argue that the OLS estimator is unbiased or consistent. Which assumption is necessary for the OLS estimator to be unbiased?
The disturbance terms in the model are uncorrelated with the regressors.
For OLS to be unbiased, the error terms must not be correlated with the independent variables. This is because if errors are correlated, it indicates that there is some systematic influence on the errors from the variables we are using to predict the outcome, which should not happen. For accurate predictions, we want our error terms to be random, not influenced by the predictors; otherwise, our estimates of the relationship between the predictors and the outcome will be off.
Other explanation:
By ensuring that the error term is uncorrelated with the regressors, we are essentially ensuring that the variation in the independent variables that we are using to predict the dependent variable is not systematically related to factors that we have not included in the model (captured in the error term). This is what allows us to argue that the OLS estimator is unbiased. If this assumption does not hold, the estimated coefficients may systematically differ from the true population parameters, leading to biased estimations.
It is possible that the disturbance term in a regression model is not a normally distributed. What is
NOT a potential cause for nonnormality?
Outliers
The sample is too small
Skewness of the distribution
Fat tails in the distribution (excess kurtosis)
The sample is too small
Consider a linear regression model where Y is explained from X. It is suspected that the true
relationship between Y and X could be nonlinear. What is NOT appropriate to test for this?
Perform a RESET test
Checking the variance inflation factors
Add Ln X and test its significance
Add X2 and test its significance
Checking the variance inflation factors
- Perform a RESET test: The Regression Equation Specification Error Test (RESET) is specifically designed to test for misspecification in the model, including nonlinearity. It is an appropriate test for this purpose.
- Checking the variance inflation factors: Variance Inflation Factors (VIFs) are used to check for multicollinearity among independent variables, not for nonlinearity. VIFs measure how much the variance of an estimated regression coefficient increases if your predictors are correlated. If you suspect nonlinearity, VIFs are not the correct tool to use.
- Add Ln X and test its significance: Adding the natural logarithm of X (Ln X) and testing its significance is a way to see if a logarithmic transformation of the variable improves the model. This can address nonlinearity.
- Add X^2 and test its significance: Adding the square of X (X^2) to the model and testing its significance is a common method for testing for a quadratic relationship, which is a form of nonlinearity.
In a linear regression model, the OLS estimator is unbiased under restrictive set of assumptions.
Which is a potential cause of the OLS estimator to be biased?
Bias in an OLS estimator can arise if a crucial variable is left out of the model.
In linear regression, the OLS (Ordinary Least Squares) estimator is used to find the line of best fit by minimizing the sum of squared residuals. For the OLS estimator to be unbiased, one key assumption is that all relevant explanatory variables are included in the regression. If an important variable that affects the dependent variable is omitted and is correlated with any of the included independent variables, it can cause omitted variable bias. This bias occurs because the omitted variable’s influence on the dependent variable is wrongly attributed to the included variables, leading to incorrect estimates of the coefficients and thus a biased estimator.
Consider a linear model that includes an intercept term. Which of the following statement(s) about
the OLS residuals is (are) correct?
1. The average residual is by construction equal to 0
2. The residuals are uncorrelated with the fitted values of Y
Both statement 1 and 2 are correct
When estimating a standard regression model, we need to make several assumptions. Which
assumption is not necessary for the routinely calculated standard errors to be correct?
The disturbance terms in the model are normally distributed
The disturbance terms in the model are serially uncorrelated
The disturbance terms in the model are homoscedastic
The disturbance terms in the model have zero mean
When estimating a standard regression model, we need to make several assumptions. Which
assumption is not necessary for the routinely calculated standard errors to be correct?
The disturbance terms in the model are normally distributed
In many applications in finance the disturbance terms in a regression model are likely to suffer from heteroskedasticity. Which statement is NOT correct?
Under hetero standard t-tests will be misleading
Under hetero the OLS estimator is biased
The use of hetero consistent standard error adjusts for arbitrary forms of hetero
Transforming relvant variables into logs may reduce or eliminate the problem of hetero
Under hetero the OLS estimator is biased
Heteroskedasticity does not cause the OLS estimators to be biased; it remains unbiased. However, heteroskedasticity does affect the efficiency of the estimators, meaning that they are no longer the Best Linear Unbiased Estimators (BLUE), and it affects the standard errors, leading to incorrect conclusions about the significance of the coefficients.
Factors reducing accuracy of OLS?
-** Large error variance s^2 **
* large influence of other factors not in the model
**- small number of observations **
* this means less info available to estimate parameters
-** little spread in indepedent variables**
* Without variation in x, one cannot explain variation in y
* Too much variation duo to outliers is bad tho
Purpose of the F-test in Regression Analysis
Question: What is the purpose of the F-test in regression analysis?
The F-test in regression analysis is used to determine whether the model as a whole is statistically significant. It tests the null hypothesis that all regression coefficients are equal to zero, which implies that the predictor variables, as a group, do not have a statistically significant relationship with the dependent variable. A significant F-test indicates that the observed relationships are unlikely to be due to random chance.
How do you interpret the results of the F-test in regression analysis?
Answer: In regression analysis, the F-test yields an F-statistic and a corresponding p-value. The F-statistic is calculated based on the ratio of the model’s mean squared error to the residual mean squared error. To interpret the results:
A high F-statistic value (much larger than 1) suggests that the variation explained by the model is significantly greater than the unexplained variation.
The p-value helps to determine the statistical significance. A small p-value (typically ≤ 0.05) indicates that the model as a whole is statistically significant, meaning the relationship between the dependent and independent variables is not due to random chance.
What is a weakness of the Fama-MacBeth approach? (What potential problem is not
accounted for?)
it does not allow for serial correlation (=autocorrelation) in error terms (within firms, across periods)
In a linear regression model, the OLS estimator is unbiased under a restrictive set of assumptions. Which is a potential cause of the OLS estimator to be biased?
An important explanatory variable is omitted (=not included while it should)
Suppose you would like to investigate whether firm performance depends upon the size of the board of the company, which is assumed to be exogenous. You estimate a cross-sectional regression explaining firm performance from board size (with coefficient b2) and a range of control variables. How do you formulate your null hypothesis?
B2 = 0