Lecture 10 Flashcards
Difference between this week and last week regression model
We now extend it to when there are more than 2 time periods
- Ai is the unobserved, individual-specific fixed effect
- Uit is the idiosyncratic error term, which varies over time, assumed to be mea independent of xit
How do we first difference now, with T units/models?
To remove, ai, subtract (t-1)th regression from the tth one
- now have new pooled regression, with redefined parameters
- pooled OLS in the differences regression will be consistent if E[change(uit)|change(xit)] = 0, which is implied by the strict exogeneity assumption -> E[uit|xi1,…,xiT] = 0
The role of autocorrelation
- how does this change for repeated cross sections and panel data
If errors in one time period are correlated with errors in a different time period, it is auto correlated
- in repeated cross sections, a new sample is drawn independently in each sample, means errors from different time periods are uncorrelated
- in panel data, same individuals are followed over time, so outcomes of an individual in one period likely to correlate with outcomes in another
Means SE will be biased
When will first-differences errors be auto correlated?
First differencing creates overlapping terms in differences errors, introducing negative autocorrelation in the change in Uit
- with an autoregressive model, autocorrelation becomes stronger when errors follow a persistent process.
But what are the consequences of autocorrelation?
- variances of OLS estimators will contain additional terms due to autocorrelation
- HR SEs rely on assumption that all observations in our sample are mutually independent, but autocorrelation violates this
- therefore, HR SEs become inconsistent in the presence of autocorrelation
With panel data, how can we adjust the HR SEs to accounr for autocorrelation?
- in panel data, each unit, so lets say a city, individual or firm has T observations across time
- within a unit, observations likely auto correlated, forming a cluster for that unit
- clustered SEs account for these intra-cluster dependencies, while assuming independence between clusters
Fixed effects/Within estimator:
-> Yit = B1xit + ai + uit
-> mean(yi) = B1mean(xi) + ai + mean(ui)
Subtract bottom from top:
Yit’’ = B1xit’’ + uit’’
- unbiased/consistent if E[uit’’|xit’’] = 0, implied by strict exogeneity assumption
- we need variation in xit over time, for each individual i, otherwise the deviation from mean is 0 and B1 cannot estimated
Within or first-differencing estimator - which to use?
If T=2, two estimators are identical
- under FE.5 and FE.6, the within estimator is BLUE for any T>/2, and in large samples, use normal t and F
- under FD.5 and FD.6, first difference estimator is BLUE for T>/2, again in large samples, use normal t and F
Scenario 1: change(ui2) and change(ui3) are uncorrelated
The usual SEs can be used
Scenario 2: change(ui2) and change(ui3) are correlated
We say they are autocorrelated, and thus must use clustered SEs
How to derive large sample distribution of regression and its standard errors
- have the B1^ = (…/…)
- sub in change(Yit) = B1change(xit) + change(uit)
- let Vi_ and Zi_ come in
- can now use CLT to derive the large-sample distribution
- get two different values for variance, if autocorrelated or not
Random effects model
Unlike the fixed effects model, the RE model assumes that ai is uncorrelated with xit across all time periods
- under this assumption, ai is treated like a random variable rather than a fixed parameter
- if this holds, RE estimator is more efficient than FE as they use within + between group variation in xit
Within group data is
How changes in xit over time affect Yit for the same individual
Between group data shows
How changes differences in xit between individual affect the outcome
RE: errors in regression are autocorrelated as same ai is present in Vit across all time periods for an individual, how do we deal with this?
- covariance is not 0 for different time periods, implying Vit and Vis are autocorrelated
- use a suitable transformation to remove ai
- removes ai, leaving uncorrelated errors and theta determines the degree of adjustment
- estimate theta by estimating the variance components using residuals from pooled OLS, compute theta, apply the transformation and run OLS
FE or RE?
If cov(ai,xit) = 0
- both FE and RE are consistent
- but RE is more efficient
If not
- FE still consistent
- RE is biased and inconsistent
So can we determine if cov is 0 or not
Hausman test
To decide which model to use:
H0: Cov(ai,xit) = 0, if true then RE is BLUE
H1: not 0, so only FE is consistent
T test: t = ((B1^re - B1^fe)/(se(B1^re - B1^fe))