6. Panel Data Flashcards
What is panel data?
Repeated observations over time for the same individual/city/country
Pooled cross section dataset
The samples in any two years are independent i.e. different individuals
Possible solutions to finding a relationship between two variables in panel data
- add more explanatory variables
- focus on the second year and add a lagged dependent variable
- pool across years so we have more observations and therefore more power
- exploit the panel and have a think about the structure of the errors
What is the fixed effect part of the error term
The part of the error term that is is common to all of the observations relating to a particular city and doesn’t vary over time
Idiosyncratic error
The part of the error term that varies across cities and across time even for a given city
What is the first differenced equation?
It is where we subtract the equation from the second period from the one in the first period
What are the advantages of the first differenced equation?
Reduces omitted variable bias, Increases the accuracy of our estimates and the power of our hypothesis tests
Downsides of panel data
- costly to collect
- sample attrition (some people included in earlier rounds aren’t found in later rounds)
- over time sample may become less representative
- differencing doesn’t help if the variables of interest don’t change or change very little over time
- the potential for omitted variable bias still exists
For the fixed effect estimation what do we need to get unbiased estimates of the standard error of B1?
We need to address heteroscedasticity and we need the idiosyncratic to be serially uncorrelated
What is a drawback of first differencing and fixed effects estimations?
We can’t use them to investigate the effects of explanatory variables that are time invariant
When do we use GLS?
When estimating the random effects model as long as we have a large N relative to T
What assumption do we have to make for the random effects model?
That all explanatory variables are uncorrelated with both the idiosyncratic errors Uit and the time invariant unobservables ai in every year
How does the fixed effects model compare to the random effects model?
random effects Is more efficient giving better coefficient estimates and leads to better inference as long as the assumptions hold
What is the Hausman test?
Where we test the correlation of the explanatory variables and the time invariant unobservables by comparing the estimated coefficients on the time varying explanatory variables in the FE and RE models