10 Flashcards

1
Q

What is panel data?

A

Panel (or longitudinal) data – combine time-series and cross-sectional data by including observations on the same variables from the same cross-sectional sample from two or more different time periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are different panel types?

A
  1. Stacked time-series panel data – your data is ordered by the cross-sectional variable, while the time-series units are stacked one on another.
  2. Stacked cross-sectional panel data – your data is ordered by the time-series variable, while the cross-sectional units are stacked one on another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are 4 diferent kinds of variables that we encounter while using panel data?

A
  1. Variables connected to different cross-sectional entities, but which do not differ over time (e.g., organisation type, region of origin).
  2. Variables that change over time but are the same for all cross-sectional entities (e.g., global crisis).
  3. Variables that vary both over time and between cross-sectional entities (e.g. income, investments).
  4. Trend variables that vary in a predictable way for all (e.g. days, months, years, age of a company).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use panel data?

A
  1. It provides answers to questions that cannot be accurately answered with other data.
  2. It allows for a relatively easy increase in sample size.
  3. It allows us to avoid omitting variable problems that otherwise would cause bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are types of panel regressions?

A
  1. Pooled OLS (POLS) model – ignores the panel structure of the data by “throwing everything together” and running OLS.
  2. Fixed (unobserved) effects (FE) model – assume that the independent variables have a fixed or constant relationship with the dependent variable across all observations.
  3. Random effects (RE) model – assumes that the relationship between the independent variables and the dependent variable may vary from one cross-sectional unit to another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are advantages and disadvantages of Pooled OLS?

A

Advantage: a large sample size, leading to precise estimators and test statistics with more power.
Disadvantage: does not capture different trends that might be present in the data in different time periods. Though this can be mitigated by including dummy variables that control for different years, but then you would be just running a fixed effects model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is important to do wihen creating Pooled OLS model?

A

You need to check for and fix where necessary:
1. (Multi)collinearity
2. Heteroscedasticity of residuals
3. Normality of residuals
4. Autocorrelation of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantage of FE models over POLS

A

Avoids bias due to omitted variables that don’t change over time (like geography) or that change over time (if you include time fixed effect) equally for all entities (like the speed limit). It does so by including the dummy variables outlined prior. That is, the dummy variables allowing each entity’s intercept and each time period’s intercept to vary around the omitted condition baseline (when all the fixed effect dummies equal zero).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disadvantages of FE model

A
  1. The Degrees of freedom for fixed effects models tend to be low because we lose one degree of freedom for every dummy variable in the equation, which leads to less accuracy.
    2.No independent variables that vary across entities, but do not vary over time within each entity, can be used because they would create perfect multicollinearity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Advantages and disadvantage of the RE model

A

Advantages: more df than FE; you can estimate coefficients for explanatory variables that are constant over time
Disadvantage: requires us to assume that the unobserved impact of the omitted variables is uncorrelated with the independent variables (𝑋𝑠), if we’re going to avoid omitted variable bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to select the best panel regression model?

A

By running some tests:
1. Joint significance of differing group means – tests if pooled OLS is better than the fixed effects model; if 𝐻_0 is rejected fixed effects is better.
2. Breusch-Pagan test – tests if pooled OLS is better than the random effects model; if 𝐻_0 is rejected, it means random effects is better.
3. Hausman test – checks if random effects is better than fixed effects; if 𝐻_0 is rejected fixed effects is better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why stationarity is not important for panel data?

A
  1. Panel data sets often have short time frames, not allowing the non-stationary issue to materialise itself (rule of thumb: if you have 5 or fewer time units you do not need to worry about non-stationarity).
  2. Fixed and random effects, due to how they operate, are less prone to non-stationarity issues.
  3. Panel models often are cointegrated, meaning non-stationarity is not an issue.

However, there is also a lot of evidence that if you have relatively large time-frames, non-stationarity might create a lot of issues. So it is important to fix that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How non-stationarity can be fixed in panel data?

A

Levin-Lin-Chu test – a test that, essentially, is a modification of the ADF test, which checks stationarity of the time trends in each cross-sectional entity and aggregates the results for the whole panel model.

If non-stationarity is identified it has to be fixed the same way how non-stationary time-series data is fixed, by calculating the difference between consecutive observations.

However, here you have to be careful as differencing can heavily shrink the sample size of your data (i.e. taking the difference gets rid of the first observations in each cross-sectional entity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Autocorrelation and heteroscedasticity of residuals

A

Regarding heteroscedasticity and autocorrelation, you, essentially use similar tests that you used on time-series and cross-sectional data.

The only noteworthy expectation is that for random effects data sets heteroskedasticity is not a big problem, due to how residuals are used in such models, so you do not need to check for it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How autocorrelation and heteroscedasticity in panel regressions can be fixed?

A

Autocorrelation and heteroscedasticity in panel regressions can be fixed by employing the same approaches that you use for time-series and cross-sectional data, including using appropriate robust standard errors.

With panel regressions, you have a choice between several roust standard errors, two of which are included in gretl:
1. Arellano – it is a standard error that allows to take into account heteroscedasticity and autocorrelation if you have large 𝑛 and small 𝑇.
2. Panel-corrected standard errors (PCSE) – it takes into account heteroscedasticity issues, and to a smaller extent autocorrelation issues. Hence, you use this when you only have an issue of heteroscedasticity in your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How collinearity can be fixed in panel data regressions?

A

For Pooled OLS you can use the same approach we used before – Variance Inflation Factor (VIF) to check for collinearity.

However, for fixed and random effects models you need to use instead Belsley-Kuh-Welsch collinearity diagnostics, which is similar to VIF, but has a bit more complicated interpretation.