Count variable
An ordinal variable that takes non-negative and discrete values: 0, 1, 2, 3, etc.
Types of models that can be used on count data:
1) Poisson regression
2) Negative binomial regression
3) Truncated poisson/negative binomial regression
4) Zero inflated poisson/negative binomial regression
Examples of count variables:
Why not standard regression OLS?
Cause it assumes a normal distribution, but also:
- often very low mean (> 10, it would be appropriate)
Poisson distribution
Poisson regression analysis
Link function
Yi = exp(XiB)
Model estimation, via..
Maximum likelihood estimation, the parameters will be estimated by searching for those parameters values that give the highest likelihood to observe the data.
Interpretation of the parameters
Model fit and selection
1) Likelihood
2) Likelihood ratio test -> Only for nested models
3) AIC, BIC, CAIC -> models may differ, however data should be the same.
Same violations or “cHaLLaNgEs”
1) Mean not equal to the variance
2) Zero events cannot be observed
3) More zeros then expected
Under dispersion
Mean > variance
Almost never happens. If so still use poisson regression
Over dispersion
Mean < variance
Use negative binomial regression.
Dispersion test
If significant, assumption is violated. So the poisson distribution is not suitable.
What to do if Zero events cannot be observed?
Use a truncated model. For the truncated negative binomial regression the second intercept represents the extra variable.
When is observing Zeros not possible?
1) Basket size of online orders
2) Relationship length
3) Household size
What to do if there are more Zeros then expected?
Investigate whether there is a peak at zero. If so? Two options:
1) Zero inflated models
2) Hurdle models or zero-altered models.