Econometrics - Panel/TS & hetero/autocorrelation Flashcards
What is heteroskedasticity and why is it a problem?
Non-constant variance in errors - violates a classical assumption
What are 6 reasons heteroskedasticity occurs?
- Error-learning models
- Real income grows through time
- Improved data collection over time
- Outliers in a sample of data
- An incorrectly specified model
- Skews in distribution in an X variable
What are the consequences of heteroskedasticity?
No longer minimum variance so inefficient estimator, not BEST anymore so there can be another estimator that can produce smaller variance, there will be a breakdown in inference (std errors no longer unbiased so issues w t-tests)
What are the 3 main tests for heteroskedasticity?
- Goldfeld-Quandt Test (GQ test)
- Breusch-Pagan Test (BP test)
- White’s Test
How does the Breusch-Pagan test work?
Fit regression, then calculate the squared residuals, and fir a new model using the squared residuals, then calculate the chi-square test stat and p-value, and compare to sig level. Null hypothesis is homoskedasticity.
How can you correct for heteroskedasticity?
- Transform model e.g. logs/squares/inverse
- Robust standard errors
- Generalised Least Squares/Weighted Least Squares (GLS/WLS)
What is autocorrelation/serial correlation and why is it a problem?
Errors correlated with their previous value - violates classical assumption
When does serial correlation occur?
- Time-series data
- Spatially organised data
- Can be in cross section but less common
What causes serial correlation?
- Omitted lagged variables
- Economic shocks that have persistent effects
- Transformations applied to data
- Model misspecification
- Error term being truly dynamic
What is first-order autocorrelation?
Assume that the errors is correlated linearly only with its value in the previous period
What are the consequences of autocorrelation?
Residuals don’t have minimum variances so OLS isn’t BLUE.
R-squared may seem high
Standard errors may be baised downwards - OLS is inefficient and incorrect inferences may be made
What does the Durbin Watson Test test for?
First-order correlation
What values of the Durbin Watson test statistic indicate first-order autocorrelation?
DW -> 0 = positive autocorrelation
DW -> 2 = no autocorrelation
DW -> 4 = negative autocorrelation
What are 3 limitations of the Durbin Watosn test?
- Not valid in dynamic models as test stat biased to 2
- Only applies to first-order autocorrelation
- Bounds test doesn’t offer exact critical values so an element of doubt
What is a better test for serial correlation?
Bresuch-Godfrey Lagrange Multiplier Test
How is a Bresuch-Godrey LM test conducted?
Estimate OLS and obtain residuals, estimate auxiliary regression and then either compute LM test stat and compare to ch-squared dist OR use an F-test
What can be done to correct for serial correlation?
Employ robust standard errors (HAC standard errors)
What do hetertoskedasticity autocorrelated consistent (HAC) standard errors do?
Larger standard errors so less statistical significance
What is the difference between a True and Natural experiment
True = observations randomly assigned to different groups
Natural = not randomly assigned
What is the equations for a Difference-in-Difference model?
Y = b0 + b1Gi + b2Ri + b3(G.R) + error, where G=1 in treatment group, otherwise 0, and R=1 if observation is observed in period 2, otherwise 0
What is the interpretation of b3 in a general D-in-D?
Average treatment effect (ATE), captures the policy effect
What are the advantages of using panel data? (5)
- More information, more variability, less collinearity, greater degrees of freedom - hence more efficient
- Consider dynamic changes
- Detect/measure effects that can’t be observed in other data types
- Better model specific types of economic behaviour
- Large panels less likely to produce biased estimates
What are the 4 main types of panel models?
- Pooled OLS
- Fixed Effects Least Squares Dummy Variable Model (LSDV)
- Fixed Effects Within-Group Model
- Random Effects Model
Explain pooled OLS?
Pools data and estimates simple OLS - disregards time and entity dimensions
Explain LSDV model?
Pools data and gives each entity its own intercept dummy
Explain fixed effects (within group) ?
Each entity given its own intercept, but each variables is expressed as deviation from mean value
Explain random effects model?
Assumes intercepts are random draws from a bigger population
Problems of LSDV model?
- Time-invariant - can’t consider variables over time
- Too many dummies can reduce degrees of freedom
- Lots of dummies = multicollineaity
- Can have issues with error term
How is random effects different from fixed effects?
Random effects doesn’t estimate an individual effect for each observation, it estimates an overall estimate of the intercept that captures the average effect within the sample of data
What are the 2 error terms in a random effects model?
- cross section/individual specific error component
- combine time series and cross section error comportment - idiosyncratic error term (always in a regression)
When should REM be used over FEM?
If you think differences across firms influence dependent variables
What does the Hasuman test test for? What is the null/alternate hypothesis?
Tests whether the unique errors are correlated with the regressors. The null hypothesis is that there is no correlation and so estimates are consistent and REM is preferred.
What type of estimator is used for REM and why?
Generalised Least Squares (GLS) - cannot use OLS as would yield inefficient estimators.
What models can be employed if you have a binary/limited dependent variable?
- Linear Probability Model
- Logit Model
- Probit Model
- Tobit Model
What type of estimation the is Probit Model?
Maximum-Likelihood estimator
What is the interpretation of a Probit Model?
Have to calculate marginal effects and the results are a change in probability, in software marginal effects are the slope value
What is the interpretation of a Logit Model?
Coefficients are a partial slope coefficient, the measure change in logit for a unit change in X
How do you decide between a Logit and a Probit?
- Measure of fit, R-squared
- Hypothesis test
- Model interpretation (marginal effects)
What is a Tobit Model used for?
Limited dependent variables, where the value is continuous but is cut-ff/censored at a particular value
How does the Tobit Model work?
Uses maximum likelihood estimation that treats the cutoff/censor values differently
What is the general equation for a simultaneous equation model?
How do you estimate a reduced form equation?
Use the relationship between the 2 simultaneous equations e.g. Qd = Qs and rearrange to get a reduced form equation. Treat as a normal simultaneous equation
What is the problem with using OLS with simultaneous equation models? What is the solution?
Simultaneity bias, use two stage least squares (2SLS) instead
How do you conduct a 2SLS
Identify exogenous and endogenous variables, use an instrumental variables approach where regressors = endogenous variables and the instruments = exogenous variables
What makes a good instrument?
Z (insturment) must be exogenous in the equation , but related to X (endogenous variable)
What is the difference between static and dynamic time series data?
Static =change in X at time t has an immediate effect on y e.g. Phillips curve
Dynamic = where change in X at time t doesn’t have an immediate effect on y, lags are included in the model to account for the time it takes for the change in X to be full absorbed by Y
In TS data - what additional classical assumption is needed?
Strict exogenity = for each time period the expected value of the error term given all explanatory variables for all time periods is 0
What problem will you have if stricy exogeneity is not achieved?
biased OLS estimates
How can you prove consistency?
Stationarity and weak dependence
What does stationarity mean?
Probability distribution is stable over time - the statistical properties of a process generating a time series do not change over time. (the series changes over time but the WAY it changes does not itself change over time)
What are 2 problems with dynamic models?
- High mutlicollinearity
- Loss of degrees of freedom
What is an autoregressive model? Why is it used?
Replace lagged values of independent variables with lagged dependent variables. Used to mitigate issues with dynamic models
What happens when TS data is not weakly dependent?
Random walk - which is highly persistent and non-stationary
What is it called when you have highly persistent TS data with a trend?
Random walk with drift
How can you make a problematic series stationary and have weak dependence?
First-differencing the data series
What model can you employ if you have serial correlation in TS?
Feasible Generalized Least Squares
How do you test for a unit root? What is the null/alternate hypothesis?
Dickey-fuller test (uses special DF critical values)
H0 = there is a unit root, data is non-stationary
H1 = there is not a unit root, the data is stationary