Panel data Flashcards
what is panel data
a panel data set contains repeated observations over time for individuals, firms, countries etc
what is heterogeneity
the quality or state of being diverse in character or content
what is the main advantage of panel data
we are able to allow for certain forms of unobserved individual heterogeneity that is constant over time which cannot be done with cross-sectional or time-series
what is the panel data equation
yit = xit’β + vit,
=xit’β + αi + uit,
α are unobserved constant individual effects,
uit idiosyncratic shock
how does Pooled OLS and Random Effects GLS deal with panel data
Pooled OLS and Random Effects GLS are biased and inconsistent if the unobserved fixed individual components αi are correlated with the explanatory variables xit
why are Fixed-Effects OLS and First-Differenced OLS better with panel data
they are also consistent when there is correlation between the αi and the xit.
They solve the endogeneity without need for instrumental variables
what does the pooled OLS do
simply treats the panel as a very large cross-section with nT observations
what are the four estimators for panel data
pooled OLS,
random effects GLS,
Fixed effects OLS,
First differenced OLS
what is the main condition for pooled OLS
E(vi|Xi)=0,
this means E(α|Xi)=0 and E(uit|xit)=0,
for pooled to be consistent there cannot be correlation between the unobserved fixed component αi and xit
for pooled OLS with vit=αi+uit what is the variance
E(vitvis)≠0 for s≠t,
if E(uituis)=0 for s≠t then,
E(vitvis)=E(αi^2)=σσ^2 (second σ is subscript),
standard errors that don’t take the serial correlation of vit into account will be wrong (try derive from E(vitvis) in notes)
do you need cluster robust standard errors for pooled OLS
yes, standard errors don’t take the serial correlation of vit into account will be wrong,
need cluster rob se unless σσ^2=0 (2nd σ subscript)
what is the main assumption of the random effects GLS
E(vi|Xi)=0,
E(αi|Xi)=0 and xit strictly exogenous,
E(uit|xis)=0
how is the random effects GLS better than pooled OLS
improves efficiency of pooled ols by taking clusters into account
what is process of random effects gls
takes stylised model, assumes complete homoskedasticity, get a variance covariance matrix and plug into GLS estimator (like WLS in that it divides)
what are the assumptions for the consistency and efficiency of random effects GLS (equations)
E(vi|Xi)=0
E(αi|Xi)=0
E(ui|Xi)=0
E(vivi’|Xi)=Ω
what is the case in a random effects GLS when var=Ω
when var=Ω the estimator is efficient
what is the case in random effects GLS when var≠Ω
if not, as long as strict exogeneity the estimator is still consistent and normal but no longer efficient
what are the assumptions in words for the efficiency of random effects GLS
strict exogeneity,
no correlation between α and x (E(αi|Xi)=0),
specific variance covariance structure = Ω
what is the main thing that the fixed effects OLS allows for
correlation between αi and the regressors,
E(αi|Xi)≠0,
it is a positive function of x
how does fixed effects solve the endogeneity problem of alpha
by estimating the model including n individual intercepts, or a constant and n-1 dummies
what is still assumed for the fixed effects OLS even though it allows for correlation between α and x
FE OLS estimator is consistent, allowing for general, unspecified correlations between α and x, under the assumption of strict exogeneity of the regressors,
E(uit|xis)=0
does the full model with n intercepts need to estimated for fixed effects OLS
no,
the estimator just takes into account distances of variables from their means
describe fixed effects ols in a nice sentence
we simply regress the dependent variable in deviations from the individual specific means on the regressors in deviations of their respective individual specific means
what is one disadvantage of fixed effects ols compared to first differenced OLS
fixed effects OLS needs strict exogeneity so requires no correlation between xit and any of the idiosyncratic shocks uis,
so, this excludes feedback ie inputs xit do not respond to shocks in the past,
E(xituit)=0
what is an assumption of fixed effects OLS in order for it to be efficient
If strong exogeneity holds, E(ui|Xi)=0 and the ui are further conditionally homoskedastic and not serially correlated,
E(uiui’|Xi)=σu^2IT,
fixed effects is the efficient estimator when allowing for general correlations between α and x
how does first differenced OLS solve the α issue
taking first differences over time for an individual eliminates αi
what are the consequences of first difference OLS to do with exogeneity
is a weaker assumption than strong exogeneity,
cannot have ut-1 affecting xt but can have ut-2 affecting xt,
first differenced so loss of n-1 degrees of freedom
do you have to do clustering or robust standard errors for first differenced
yes have to do clustering and robust standard errors,
robust allows for general serial correlation patterns and general forms of heteroskedasticity of the variances and covariances
how are common time effects taken care of in the panel data estimators
common time effects can be taken care of by time specific coefficients, inclusion of time dummies:
yit=xit’β+λt+αi+ui
what do clustered standard errors do
account for heteroskedasticity across clusters of observations (states, schools or individuals), used when each unit observed across time, allows for correlations between observations
which estimators need cluster robust standard errors
pooled ols and first differenced ols always,
RE gls and fe ols do if you want to allow for heteroskedasticity as they both assume homo