Pooled Cross-sectional Analysis & Simple Panel Data Methods Flashcards
List the characteristics of pooled-cross-sectional data.
- An independent pooled cross-section consists of cross-section samples that have been randomly drawn at different time periods.
- With PCS, often a goal is to see how the mean value of a variable has changed over time in ways that cannot be explained by observable variables.
- Data obtained by pooling cross-sections are very useful for establishing trends and conducting policy analysis.
- Regression analysis is based on the classical assumptions of OLS linear regression. The resulting estimation methods are referred to as pooled OLS (POLS).
What is the importance of including period dummies (time effects/components) in PCS analysis? (Refer to the slides for generic representations of the regression equation)
- If we have T years of data, we include for T-1 the number of year dummies (this allows the intercept to change over time). The excluded is typically the first period; however, this is not a set rule.
- Generally, the analysis begins by examining whether intercepts change across time.
- The PCS framework also allows us to test whether slope coefficients change across time (via interacting time dummies with the independent variable of interest).
- If the entire regression function changes over time, we cannot benefit from the advantages of using PCS.
- An F test (possibly made robust to heteroscedasticity) is used to test whether the intercepts change over time.
How would you implement the Chow test to test for structural change in a two-year framework? Method 1
- The Chow-test is used to determine whether or not the population may have different distributions across two time periods.
- Obtain the restricted SSR from the pooled estimation (SSRp). Calculate the unrestricted SSR from the two separate regressions (one for each time period): SSRur=SSRt’78+SSRt’80.
- To test the hypothesis that estimated coefficients in the model is the same across the two time periods, compute the F-statistic - Paper flashcard has the formula.
- This test is only valid when errors of variance terms are assumed to be homoscedastic.
How would you implement the Chow test to test for structural change in a two-year framework? Method 2
- Interact each variable with a year dummy for one of the two years.
- Test the joint significance of the year dummy and all the interaction terms.
- Estimate the restricted model by performing a pooled regression allowing for different time intercepts, and obtain SSRt.
- Run a regression for each T time period, obtain the SSR for both, and obtain the unrestricted SSRur=SSR1+SSR2+…+SSRT.
Compute the F statistics - paper flashcard - This test is not heteroskedasticity robust. To obtain a heteroskedasticity robust test, we must construct the interaction terms and do a robust pooled regression.
Explain policy analysis using two-period CSD.
- A minimum of two independently sampled cross-sections are needed for policy/ event assessments.
- The pooled cross-sectional framework provides the set-up to assess the impact of a policy or an event.
- Assumption: Policy or event should be exogenous to the variable of interest.
- One of the groups is exposed to a ‘treatment’ in the second period but not in the first. The second group is not exposed to the treatment during the first period.
- For such an assessment, it is important for the experimenter to identify: a. a control group that is assumed to not be affected by policy change/event/intervention and b. A treatment group that is assumed to be affected by policy change/event/intervention.
What is pooling in natural experiments?
- Natural/Quasi-experiments is a study where the investigator analyses the effect of exogenous shocks on some entity.
- We refer to it as natural because the event was not premeditated by the investigator who is analysing its effect.
- In natural experiments we have a. control groups that are not affected by policy change and b. a treatment group that is assumed to be affected by the policy change.
Explain an application of diff-in-diff technique in natural experiments.
- Define dT as a dummy for membership of the treatment group and d2 (period 2) as a dummy for respondents in the post-experiment (after) cross-section.
- Estimate the naive regression - paper flashcard
- Compare the difference in outcomes of the units that are affected by the policy change (treatment group) and those who are not affected (controlled group) before and after the policy was enacted.
- Difference-in-difference only works if the difference in outcomes between the two groups is not changed by other factors other than policy change - there must be no differential/parallel trends.
What are the characteristics of panel data?
- This is a special type of pooled data in which the same cross-sectional unit/element is surveyed over time.
- Examples of the cross-sectional units that may make up panel data series include countries, firms, individuals or demographic groups.
- Panel data can be stored 1 of 2 ways:
a. Long-format panel; there is a row for each cross-sectional unit for each time period.
b. Wide-format panel; only one row per cross-sectional unit and a separate column for each measure and time point.
c. Many econometric’s software allows reshaping formats.
What is unobserved heterogeneity?
An unobserved heterogeneity or cross-sectional unit fixed effects are usually certain characteristics that affect the dependent variable but are not observed and also does not change with time - time-invariant.
The main reason for collecting the panel data is to allow for the unobserved effect (ai) to be correlated with the explanatory variable (Xit).
The actions and intuitions behind panel analysis obviously introduce some correlation between ai and Xit.
We, therefore, need alternative estimation methods that can result in unbiased, consistent and efficient parameters.
How do you deal with unobserved heterogeneity?
Improve estimation by:
1. including other variables like historical rates,
2. but data availability and other problems may make that difficult.
3. one solution: Apply an appropriate panel technique -
a. in such experiments we should consider some individual city-specific factors as heterogeneous and unobservable.
b. then use panel data from repeated measurements on the same city to capture these effects.
see the following slide for more understanding.
How do we deal with unobserved heterogeneity in panel data?
- First differencing
- See slides for the equations, but differencing eliminates the unobserved fixed effects ai.
- Application of OLS would thus give consistent estimates of beta on the provision that Xit is strictly exogenous (the rate of change in Xit is uncorrelated with the rate of change in Uit) and no lagged Yit as a dependent variable.
What are the problems with first differenced models?
- First differencing removes variables that don’t vary with time (e.g. gender, race, etc.).
- The effective sample size is reduced.
- There is a loss of true data characteristics.
What does the first differenced model explain?
The first differenced model explains variation in changes in the dependent variable across units and not variations in levels.
This means the regression only provides an explanation of the effects in terms of their changes over time. Changes in the independent variable should therefore be linked to changes in the dependent variable.
There may be an arbitrary correlation between the unobserved time-invariant characteristics and the included explanatory variables.
OLS in the original equation would therefore be inconsistent.
The first-differenced panel estimator is thus a way to consistently estimate causal effects in the presence of time-invariant endogeneity.
A general two-period framework for policy evaluation:
See the slide that contains the data structure
- In many cases a researcher working for his company, academia or international development partner may want to assess the effectiveness of a policy or programme implemented in a country.
- One way is to compare the effect of this programme in the year it was implemented to a reference year - when the program was not implemented.
- If programme participation occurred in the second period, the OLS estimation of B1 in the difference equation is represented as B1 hat= the average change of y treatment - the average rate of change of y control. This is the panel data version of the diff-in-diff estimator.
- If programme participation took place over the two periods, then B1 hat represents the change in the average value of Yit due to programme participation.
When is strict exogeneity false?
Explanatory variables are strictly exogenous after taking out the unobservable effects, ai.
- If we have omitted an important time-varying variable
- Measurement error in one or more explanatory variables
- If we have a lagged Yit as an exogenous variable - exogenous variables reacting to changes in the idiosyncratic errors.