Module 7 Count Data Flashcards
Characteristics of count data
- Dependent variable is nonnegative integer
- Often, few small discrete values
- Zeros present
- Right skew
- Heteroskedastic
What are the Count Data Models
- Poisson
- Negative binomial
- Zero inflated
- Poisson (zip)
- Negative binomial (zinb)
Poisson Distribution characteristics
- Where 𝜆 is the mean outcome
- The variance is also 𝜆
- Equidispersion – when the mean of a distribution is equal to the variance
As the mean gets larger, the Poisson approxiates the normal distribution
Issues with Poisson
Overdispersion => solved by robust command in stata
Negative Binomial
NB regression estimates an overdispersion parameter 𝛼
if 𝛼 = 0, use Poisson
if 𝛼 > 0, use NB because variance is greater than mean
NB will likely have a more precise estimate, smaller CI
Poisson and NB interpretations
Regression outputs are semielasticities, may need to exponentiate!
Margins output is interpreted in level
When to use Zero-Inflated Models
- Excess zeros in dependent variable because Poisson and NB underpredict zeros
Two methods to predict a zero
Logit/Poisson or NB
Inflation options
- On a constant
- On some or all X vars
Interpretation of ZIP
Still in semielasticities (%)
Why shouldnt we multiply ZIP inflated on a CONSTANT coefficient interpretations with the mean?
Because of the zero-inflated factor, multiplying the mean would overstate the positive average marginal effects and understate the negative average marginal effects
How to interpret ZIP CONSTANT margins
In level, the margins command incorporates the infaltion factor for us
How to interpret ZIP inflated on X vars?
- Still in semielasticities
- HOWEVER, interpreting the coefficients is not enough because we need to look at the inflation facts too
- Simply discussing the y semielasticities would overstate the positive effects and understate negative effects.
How to interpret the inflate coefficients?
The inflate coefficients are semielasticities on the probability of nonuse
Probably need to exponentiate!
Cannot interpret on their own
ZIP X var margins interpretation
In level!
ZINB inflated on a CONSTANT
Interpretations are still semielastic!
ZIP vs ZINB selection
Look at chi2 p-value
If chi2 value is low (<0.05), reject the H0 therefore rejecting ZIP
low chi2 (p < 0.05) => use ZINB
high chi2 (p > 0.1) => use ZIP
ZINB inflated on some X vars
Inflating on all X vars will not converge
Interpretatiosn still in semielasticities
Model selection: Alpha
Models Chi^2 p-value Conclusion
Standard (uninflated) 0.001 NB outperforms Poisson
Inflated on Constant 0.035 NB outperforms Poisson (at 95% confidence level)
Inflated on Birth Control 0.416 Poisson outperforms NB
Model selection: Correlations
Higher correlations => better fit
Information criterions: AIC BIC
The samller ICs are preferred