term 2- regression with panel data- revise Flashcards
what is a panel dataset?
a panel dataset contains observations on multiple entities where each entity is observed at two or more points
what is a balanced panel?
no missing observations; that is, all observations are observed for all entities and all time periods
what is an unbalanced panel?
some observations are missing; that is some observations are not observed for some entities and time periods
what can we control for with panel data?
we can control for factors that:
vary across entities but do not vary over time
could cause committed variable bias if they are committed
are unobserved or unmeasured - and therefore cannot be included in the regression using multiple regression
how does panel data control for ommitted variable bias?
if an ommitted varibale does not change over time, then any changes in Y over time cannot be caused by the ommitted variable
what if there is more than two time periods?
if there are more than two time periods you can rewrite the regression in two ways:
1) “n-1 regressor” regression model
2) “ fixed effects” regression model
express the fixed effects model in n-1 binary regressor form?
Y_it = B0 +B1X_it + s_2 D_2i +… + s_n D_ni + u_it
where D_2i = 1 for i=2 or 0 otherwise
express the fixed effects model in the fixed effects form?
Y_it =a_i +B1 X_it +u_it where a_i is called a state fixed effect and is the constant effect of being in state i
when is the n -1 binary regressors OLS regression pratical?
it is only pratical when n isnt too big
how do you do n-1 binary regressors OLS regresiion?
first create the binary vairable D2i,…, Dni
then estimate 1 by OLS
inference (hypothesis tests, confidence intervals) is as usual (using heteroskedasticity robust standard errors)
this is impractical when n is large ie n=1000
how do you complete entity demeaned OLS regression?
first construct the entity demeaned variables Y~_it and X~_it
then estimate 2 by regressing Y~_it on X~_it using OLS
(it is like the changes approach but instead Y_it deviated from the state average instead of Y_i1. )
give examples of an ommitted variable which might vary over time but not across states?
safer cars such as airbags; changes in national law
these produce intercepts that change over time
what are the two formulations of regression with time fixed effects?
T-1 binary regressor formulation
time effects formulation
what is the T-1 binary regressor formulation?
Y_it =B0 +B1X_it+ 𝛿2B_2t+….+𝛿_TB_2t+u_it
where B2t= 1 when t=2 ,0 otherwise, etc for B3 up to BT
what is the time effects formulation
Y_it=B0 +B1X_it + 𝜆_t+u_it
under the anel data version of the least squares assumtion, is the ordinary least squares fixed effects estimator B1 normally distributed?
yes
what are the fixed effects regression assumptions?
1) E(u_it|X_i1,….,X_iT,a_i) =0
2) (X_i1,…,X_iT,u_i1,…,u_iT), i =1,..,n are iid draws from their joint distribution
3) large outliers are unlikely (X_iT,u_iT) have finite fourth moments
4) there is no perfect multicollinearity (multiple X’s)
when is the assumption 2 of the fixed effects regression assumptions satisfied?
(Xi1,…,XiT,ui1,…,uiT), i =1,…,n, are i.i.d.draws from their joint distribution.
it is satisfied when entities are randomly sampled from their populations from simple random sampling
what does the assumption 2 of the fixed effect regression assumption not need to satisfy?
the observations do not require to be iid over time for the same entity
what is autocorrelation?
autocorrelation means correlation with itself. suppose a variable Z is observed at different dates so observations are on Z_t t=1,..,T, then Z is autocorrelated if Corr(Z_t,Z_t+j) not equal to zero for some dates j when j is not equal to zero
if ommitted factors are serially correlated, what is the error term?
the error term is also serially correlated
why are OLS standard errors in general are wrong for panel data?
they assume that the error term is serially uncorrelated. in reality the OLS standard errors often underestimate the true sampling uncertainty
what are clustered standard errors?
clustered standard errors estimate the variance of B1 when the variables are iid across entities but are potentially autocorrelated within an entity
what is the equation of the clustered SE of Y^_?
square root[{s^2(mean of Y)}/n] where s^2(mean of Y)} = 1/(n-1) * Σ( sample mean of Y for entity i - mean Y)
what is the one key features of the clustered SE’s?
in the cluster SE derivation we never assumed that observations are iid within an entity thus we have implicitly allowed for serial correlation within an entity
why might panel data help?
potential OV bias from variables that vary across states but are constant over time
potential OV bias from variables that vary over time but are constant across states
what are the advantages of fixed effects regression?
you can control for unobserved variables that vary across states but not over time and/or vary over time but not across states
more observations give you more information
estimations involves relatively straightforward extensions of multiple regressions
what are the limitations of regression with fixed effects
need variation over time with entities
time lag effects can be important