Different Data Structures Flashcards
Explain a time series and list the benefits and issues of time series.
A time series is a set of observations on the values that a variable takes at different times. It is collected at regular time intervals. Sampling cannot be assumed to be random. Data features such as seasonality, stationarity and trends are expected.
Some benefits: a. forecasting b. examining single event shocks on a variable of interest c. causal analysis
Some issues: a. problem of not considering other factors that can affect future forecasts b. instances where many shocks affect variables but some are overlooked c. causal analysis with time series models are questionable.
Variables in the regression equation are indexed by t
What is cross-sectional data? And list benefits and issues.
Refers to data collected from a population at a given point in time. The problem of heterogeneity exists. Random sampling is assumed.
Benefits: a. Data is easy to gather and analyse, b. Information on variables is captured at a point in time.
Issues: a. Cannot be used to analyse behavioural changes over time, b. cannot be used to analyse the effect of policies over time, c. causal analyses are questionable.
Variables are indexed, in the regression equation with i
What is pooled cross-sectional data?
The data from pooled-cross-sections have both cross-sectional and time-series features.
An independent pooled cross-section consists of cross-section samples that have been randomly drawn at different periods of time: i.e. different elements are sampled across time periods.
Variables in the regression equation are indexed with i and t.
What is the motivation for pooling cross-sections over time?
- To increase the sample size.
- To obtain more precise estimators, assuming that the relationship between the dependent and explanatory variable remains constant over time
- Obtain test statistics with more power.
What are the limitations of pooled cross-sectional data?
The populations may have different distributions in different time periods. However, this may cause only minor statistical complications. Sometimes, we allow or the intercepts to vary over time to account for this problem - use of year dummies.
What is panel/longitudinal data?
This is a special type of pooled data in which the same cross-sectional unit/element is surveyed over time.
A panel data set consists of a time series for each cross-sectional member in the data set.
We follow the same random cross-sectional/individual observations over time.
E.g. investment data of the same set of firms over a five-year time period.
Further, elaborate on the characteristics of panel data.
In a panel data set there are N cross-sectional units (i=1,2,…,N) and T time periods (t=1,2,…,T)
If the time periods for which we have data are the same for all N individuals then we have balanced data.
In practice, it is common that the length of the time series and/or the time periods differ across individuals. In such a case, the panel is unbalanced.
A simple two-year panel example.
T=2 (for instance measurements of attributes for each cross-section observed in 1978 and 1980)
Information for both years is used for the regression analysis.
Usually, in panel analysis, our population regression model of interest is similar to:
Yit= B0+BiXit+uit where;
Xit is a matrix of time-variant variables (e.g. GDP, exchange rates and PPP)
Zi is a matrix of variables that do not vary over time (e.g. laws, culture, policies race, gender)
What are the limitations of panel data?
- Design and data collection problems: a. incomplete account of the population of interest, b. nonresponse due to the lack of cooperation of the respondents, c. respondent not remembering, etc.
- Measurement errors due to faulty responses, memory errors, deliberate distortion of responses, typographical errors, etc.
- Selectivity problem: Self selectivity, nonresponse, attrition.
- Short time-series dimension, typically in micro panels, makes asymptotic arguments crucially rely on N approaching infinity.
- Cross-section dependence, typically for macro-panels, which may lead to misleading inference if not accounted for.
Give examples of panel analysis applications.
- Policy analysis between different periods
- Individual earnings - the impact of education, province, etc.
- Household expenditure - the effect of income, tax rates, etc.
- Firm investment - the effects of debt, revenue, value of firms.