Models for Count data Flashcards
Count variable
An ordinal variable that takes non-negative and discrete values: 0, 1, 2, 3, etc.
Types of models that can be used on count data:
1) Poisson regression
2) Negative binomial regression
3) Truncated poisson/negative binomial regression
4) Zero inflated poisson/negative binomial regression
Examples of count variables:
- # of cars owned
- # of drinks consumed at festival
- # of products returned
- # of complaints
- # of stocks owned
Why not standard regression OLS?
Cause it assumes a normal distribution, but also:
- often very low mean (> 10, it would be appropriate)
Poisson distribution
- with parameter landa
- uses on parameter, since mean=variance
- if landa > 10, probably normal distribution so use OLS
Poisson regression analysis
- DV = Count variable
- Goal to explain DV by set of Xi
- Each Yi is randomly drawn from a poisson distribution
- Mean = variance = landa
- Outcome metric is related to Xi via link function
Link function
Yi = exp(XiB)
Model estimation, via..
Maximum likelihood estimation, the parameters will be estimated by searching for those parameters values that give the highest likelihood to observe the data.
Interpretation of the parameters
- If Xi changes with one unit (keeping all constant) the expect count (landa) is multiplied by exp(b1)
- Or.. if you use the LN of a Xi, parameter becomes elasticity. So increase in X leads to a % increase in landa (DV)
Model fit and selection
1) Likelihood
2) Likelihood ratio test -> Only for nested models
3) AIC, BIC, CAIC -> models may differ, however data should be the same.
Same violations or “cHaLLaNgEs”
1) Mean not equal to the variance
2) Zero events cannot be observed
3) More zeros then expected
Under dispersion
Mean > variance
Almost never happens. If so still use poisson regression
Over dispersion
Mean < variance
Use negative binomial regression.
Dispersion test
If significant, assumption is violated. So the poisson distribution is not suitable.
What to do if Zero events cannot be observed?
Use a truncated model. For the truncated negative binomial regression the second intercept represents the extra variable.