Model Misspecification Flashcards

1
Q

What is model specification?

A

Model specification: involves choosing the set of variables to include in a regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 5 key principals analysts can use to minimize potential for specification errors?

A
  • regression model should be based on economic reasoning. reduces risk of finding relationships by simply mining the data.
  • well-specified model should be parsimonious, which means accomplishing a lot with a little.
  • model should perform well when it is applied to out-of-sample data. A model has no practical use if it is overfit to in-sample data.
  • functional form for the variables should be appropriate given the nature of the variables. if relationship between variables is non-linear it will be necessary to make appropriate adjustments to the data.
  • model should uphold the multiple regression assumptions. Revisions may be possible if violations are detected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are 4 things that could cause a regression to deviate from its functional form?

A
  • Omitted Variables
  • Inappropriate form of variables (some data is linear and other data non linear, need to make adjustments such as using natural logarithms)
  • Inappropriate scaling of data (some statements show values in millions and other statements in billions need to make consistent and scale data)
  • Inappropriate pooling of data (must not include pools of data that should be split up, eg. if country’s central bank made an abrupt and significant change to its monetary policy 25 years ago, a data series with 50 years of fixed-income returns is likely to have two distinct clusters of data points with little or no correlation between the two clusters.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is model misspecification?

A
  • when the model doesn’t capture the true relationship between variables, leading to biased and unreliable results. Or if any of the underlying assumptions of multiple linear regressions are violated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the difference between Unconditional heteroskedasticity and Conditional heteroskedasticity, and which one is more problematic?

A
  • Unconditional heteroskedasticity: when variance of the residuals don’t increase when independent variable increases and instead are random
  • Conditional heteroskedasticity: when variance of the residuals increase as the independent variable increases on a graph (aka correlation between error variance and value of the independent variable)

conditional heteroskedasticity is more problematic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Breusch Pagan test, and formula?

A

A test for conditional heteroskedasticity (one sides, right tailed test)

Breusch Pagan Test = n*R2

n = number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is serial correlation or auto correlation? How does it affect t statistic and f statistic?

A
  • autocorrelation is evident when errors in one period are correlated with errors in other periods (aka residuals/errors are correlated with residuals/errors at different points in time or across different observations)
  • artificially inflated t statistic and f statistic because serial correlation understates true standard errors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between positive correlation & negative correlation?

A
  • Positive serial correlation is present when an error in one direction for one observation increases the chance of an error in the same direction in a subsequent observation.
  • Negative serial correlation has the opposite effect. A positive error for an observation in one period will increase the likelihood of a negative error in a subsequent period.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Durbin-Watson (DW) test vs the Breusch-Godfrey (BG) test?

A

DW test: methods to access presence of serial correlation and can only be used for positive serial correlation/first order correlation

BG test: can be used for both positive and negative serial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 3 affects of positive serial correlation vs 3 affects of negative serial correlation?

A

positive serial correlation
- standard errors/residuals are underestimated
- t -statistic are inflated
- type 1 error increases

negative serial correlation
- standard errors/residuals are over stated
- f -statistic are understated
- type 2 errors increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the difference between type 1 and type 2 errors?

A

type 1: false positive (indicates they have disease when they don’t) (when true null hypothesis is rejected)

type 2: false negative (indicates they don’t have disease when they do) (when failing to reject a false null hypothesis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is multi-collinearity and 3 affects?

A
  • when two or more independent variables are significantly correlated with each other
  • makes slope coefficents unreliable
  • makes it hard to distinguish how much each independent variable has an effect on the dependent variable
  • inflates standard errors/residuals and increases type 2 errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the name and formula for detecting multi-collinearity?

A

variation inflation factor (VIF) = 1 / (1 Rj^2)

Rj ^2 = R^2 for each independent variable

VIF 1= no correlation among independent variables
VIF 1< correlation among independent variables (suggested above 5 investigate and above 10 serious multi-collinearity which will require change to model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you correct for serial correlation and heteroskedasticity?

A
  • newey west method: adjusts standard errors accounting for patterns and smoothing out variability in the data.

( eg. Imagine trying to measure the height of a bouncy kid. If you only use one snapshot (OLS), you’ll get a bad estimate because the kid keeps jumping. Newey-West smooths out the bounces by averaging past movements and accounting for how wild the jumps are.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 3 possible solutions to addressing multicollinearity?

A
  1. Excluding one or more of the regression variables
  2. Using a different proxy for one of the variables. (eg. Square footage and number of bedrooms (highly correlated). Instead you could replace number of bedrooms with:
    • Lot size (less directly related to square footage).
    • House age (captures home value in a different way).

reduces multicollinearity because the new variable isn’t as strongly correlated with square footage.)

  1. Increasing sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly