Static Panel Data Concepts Flashcards

1
Q

Characteristic of Panel Data

A

Observations of multiple phenomena (N) obtained over multiple time periods (T) for the same firms or individuals. Allows to control for the time invariant unobservables (part of error term).
Two types of panels:
1. Wide panels
- large amount of N over short amount of time
- focus on robust and consistent estimators
2. Long panels
- few N over long period of time
- focus on efficiency
A panel is balanced when number of T is same of all N.
Panel data often violates assumptions of OLS due to:
- heteroskedasticity between groups
- serial-correlation or cross-section dependence
- individual fixed effects, that cause residuals to be dependent on the independent variables
Main reason for panel data is to allow for unobserved effect “a” to be correlated with independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Least Squared Dummy Variable (LSDV)

A

Including a dummy for each of the individual Ns in the panel. For every observation for the phenomena one, the dummy takes the value 1 and for all other phenomena it takes the value zero.
Endogeneity Assumption:
explanatory variables in each time period are uncorrelated with the idiosyncratic error (time-varying part of error).
Difference to OLS:
OLS can only be used when dummy is uncorrelated to the independent variable. Otherwise the OLS results are not consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pooled OLS (POLS)

A
  • POLS refers to the application of OLS to panel data. In POLS, the data is treated as if it were cross-sectional and the time dimension is ignored.
  • POLS only give reliable estimates if “a” (unobserved time constant factor) and u are uncorrelated to explanatory variable x.
  • Endogeneity assumption: explanatory variables in each time period are uncorrelated with the idiosyncratic error
  • Heterogeneity bias -> bias that results if “a” and x are correlated. Caused from omitting a time-constant variable
    Test for pool ability:
    1. Berausch-Pagan test
  • H0 = no heteroskedasticity
  • HA = heteroskedasticity
  • if errors are heterskedastic, there is correlation between “a” and x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fixed Effects (FE)

A
  • The FE estimator uses demeaned variables, or alternatively in LSDV uses dummies for the fixed effects. FE only uses the within variation, time-demeaning the observations for each unit. The FE estimator is consistent independently on the correlation between the fixed effects and the independent variable.
  • FE leads to same estimates as LSDV
    Disadvantages:
  • estimates are less efficient
  • demeaning wipes out all explanatory variables that do not vary within an individual, meaning that one is unable to estimate slope for these variables
  • RE try to overcome these disadvantages
    Advantage:
  • more robust to selection bias than RE
  • estimates are unbiased
  • more appropriate if the data exhaust the population
    Poolability test:
  • Chow Test: tests whether there are differences among the groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First Difference (FD)

A
  • The first-differenced equation is a single cross-sectional equation in which each variable is differenced over time.
    Key assumptions:
  • u has to be uncorrelated with x
  • x can be correlated with “a”, as “a” is removed when taking the first difference
  • data has to be homoskedastic
    Disadvantages:
  • differencing can lead to large standard errors
  • if errors are serially uncorrelated -> FE estimators are more efficient
    Advantages:
  • if error term follows a random walk, FD estimator is more efficient than FE estimator
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random Effects (RE)

A
  • RE is used to estimate regressions on panel data by allowing for different intercepts.
  • Differs from FE as the intercept is drawn from a bowl of possible intercepts.
  • They may be interpreted as random and treated as they were part of the error term
    Advantages:
  • more efficient compared to FE because SE of RE is smaller than SE of FE
  • Does not wipe out variables that do not change over time and allows to estimate slope coefficients for those variables
  • more appropriate if the data do not exhaust the population
    Disadvantages:
  • applicable only in special circumstances, mostly the key identifying assumption does not hold
  • should only be used if you can be sure, that the composite error term is uncorrelated to the independent variable
  • if they are uncorrelated, the estimate would be inconsistent
  • can be tested by the hausman test
  • biased when lagged values are included as a regressor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hausman Test

A

Tests whether the random effects estimate is insignificantly different from the unbiased fixed effects estimate. FE estimators are also consistent if “a” is correlated to the independent variable.
H0 = Random effect estimator is unbiased -> use RE
HA = Random effect estimator is biased -> use FE/FD
Can also be used to test IV (consistent) against OLS (efficient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Between Estimator

A

When using a Between Estimator one essentially applies OLS when each observation is the average of the data inside one phenomenon.
Advantages:
- reduces the bias caused by measurement error as it averages variable observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly