Models for Count Data II Flashcards
When is Poisson regression appropriate?
When the number of events (counts) follows a Poisson distribution, conditional on the predictors
What are the ways in which the assumptions for a Poisson regression can be violated?
- Overdispersion (variance > mean)
- Excess zeroes (more zeroes than in a Poisson distribution)
- No zeroes
What is equidispersion?
Assumption for Poisson regression - variance = mean
In social science, medicine, and health, in what way do count data violate assumptions for Poisson regression?
Overdispersion (variance > mean)
Models for count data: equidispersion and zeroes as expected
Poisson
Models for count data: equidispersion and excess zeroes
Zero-inflated Poisson
Models for count data: Equidispersion and no zeroes
Zero-truncated Poisson
Models for count data: Overdispersion and zeros as expected
Negative binominal
Models for count data: Overdispersion and excess zeroes
Zero-inflated negative binomial
Models for count data: Overdispersion and no zeroes
Zero-truncated negative binomial
What about underdispersion?
This can occur in principle, but is rare in practice (variance < mean)
What happens if you use the Poisson distribution even though your data are overdispersed? Or use a model that doesn’t consider excess or no zeroes when it should?
Coefficient estimates may be biased and/or misleading (i.e., slope coefficients may not be a good estimate of relationship between predictor(s) and outcome)
What are the implications for SEs when not considering overdispersion?
They may be underestimated. This implies that your p-values would be too small and your CIs to narrow, increasing the risk of Type I error
What choices do you have when outcome is overdispersed?
- Negative binomial regression (or other models accounting for overdispersion)
- Poisson regression with robust SEs
What are robust SEs?
Adjusted so they are robust to violations of Poisson regression. Robust SEs are usually larger than those from a typical Poisson regression. Considered a more cautious way of analysing the data
What is the most commonly used overdispersed distribution?
Negative binomial
What are the parameters of a negative binomial distribution?
Mean, µ, and a dispersion parameter α
The mean and variance are related (as opposed to in the normal distribution where they are independent): var(Y) = µ + αµ^2
In Poisson, we just have one parameter (mean) as variance is equal to the mean
What values can the dispersion parameter α take?
Values of 0 or larger (can never be negative)
- if α = 0, we have a Poisson distribution (with equidispersion)
- if α > 0, we have an overdispersed distribution
The larger the α, the larger the variance relative to the mean
Are there other ways of relating the mean to the variance in negative binomial regression?
Yes, different ways of relating the variance to the mean can sometimes slightly change the model or slightly improve your model
In the negative binomial distribution, what does larger dispersion imply?
Larger variance
What is the shape of the negative binomial distribution?
Tails are much larger compared to when dispersion is equal to 0 (Poisson)
Overall comparison of properties of Poisson and negative binomial distributions:
Poisson:
- Equidispersed
- One parameter (µ = mean = variance)
- Var(Y) = µ
Negative binomial:
- Overdispersed
- Two parameters (µ = mean; α = dispersion)
- Var(Y) = µ + αµ^2
In the negative binomial distribution, what is this way of specifying the variance called? Var(Y) = µ + αµ^2
NB2-parameterisation
There are other options e.g., the NB1-parameterisation: var(Y) = µ + αµ
Negative binomial regression equation:
log(µi) = β0 + β1X1i + β2X2i + … + βkXki
yi ~ NegBin(µi, α), var(yi) = µi + + αµ^2
Where ‘i’ represents each observation
- This looks similar to a Poisson regression. Again, we use a log-transformation of the outcome. The difference is that we now have an additional parameter in the model, the dispersion, α, which we need to estimate. The dispersion parameter governs the extent of overdispersion
On what scale are the coefficients from the negative binomial regression?
Log-scale. As with Poisson, they can be exponentiated to get the IRRs and 95% CIS for IRRs
Why exponentiate coefficients from the log scale to IRRs?
IRRs are more interpretable than coefficients as IRRs are on the scale of the count variable
If coefficient on the log scale is negative, what will the IRR be?
One
In an output from negative binomial regression, what does /lnalpha mean?
Log of the alpha and only needs interpreting if using predictors to predict dispersion rather than assuming it’s constant
How can we get the output on the count scale?
Using Stata to exponentiate the coefficients
Do Poisson and negative binomial regression estimated on the same data give the same results?
No - neither estimated coefficients nor SEs will be the same
What does zero-truncation often result from?
Zeroes being unobservable or ‘impossible’ e.g.,:
- Number of days in hospital for hospitalised stroke patients
- Number of appointments with a psychotherapist
How do regression models for zero-truncated data work?
Essentially in the same way as ordinary regression models
- There is zero-truncated Poisson and zero-truncated negative binomial regression. In either of these two models, predicted values will have a minimum value of 1. Otherwise, the interpretation of coefficients is the same as in ordinary Poisson or negative binomial regression
Does absence of zeroes necessarily imply zero-truncation?
No - may not have observed zeroes ‘by chance’, even though zeroes are impossible
On what basis is the decision to use a zero-truncated model made?
Knowledge about how the data were collected, rather than based on noticing that there are no zeroes
With excess zeroes, what are the theoretical bases about the origins of the zeroes?
- Zero-inflation: where zeroes can about in two different ways
- Hurdle models: Where the zeroes and the non-zero counts are caused by separate processes
In what two ways can zeroes come about?
Structural zeroes vs sampling zeroes
Consider a research example of a zero-inflated distribution: “how many joints (cannabis) did you smoke last week?” The answer is zero for:
- Structural zeroes: non-smokers of joints
- Sampling zeroes: cannabis users who happened to not smoke last week
What are hurdle models?
All zeroes are assumed to be structural, and there are no sampling zeroes
The distribution of non-zero counts is zero-truncated, and the zeroes are governed by a totally different process
E.g., number of appointments with a psychotherapist after GP referral:
- Structural zeroes: Some patients never go to see a therapist
- Zero-truncated counts: Those who go to see a therapist have at least one appointments
What is a mixture distribution?
One distribution governs zeroes and another governs the counts (zero-truncated Poisson counts)
Models for excess zeroes:
Equidispersion:
- Zero-inflated Poisson
- Poisson hurdle model
Overdispersion:
- Zero-inflated negative binomial
- Negative binomial hurdle model
How do you know which model for excess zeroes to use?
Often depends on knowledge/theory of how zeroes come about
Sometimes the zero-generating process is unknown; then a pragmatic decision might be made (e.g., based on model fit)
Relative to the mean, what indicates overdispersion?
Large counts
How can a zero-inflated model be expressed mathematically?
P(Yi = 0) = π + (1 - πi)e^-μi
P(Yi = y) = (1 - πi)μi^ye-μi / y! , y ≥ i
The first equation describes the probability of observing zero events. This probability is the sum of the probability of a structural zero (πi) and the probability of a sampling zero [(1 - πi)e^-μi
The second equation describes the probability of observing 1, 2, 3 or more events
How many parameters in a zero-inflated model?
Two - π and μ, each of which appears in both equations. But we model these parameters separately
The probability π of having a structural zero is modelled via a logistic regression:
logit(πi) = Y0 + Y1X1i + Y2X2i + … - here, the coefficients are labelled ‘Y’ to make clear they are not the same coefficients as those in the Poisson part of the model
The mean μ of the counts that are not structural zeroes is modelled via a Poisson regression:
log(μi) = β0 + β1X1i + β2X2i + …
In a zero-inflated model, do the predictors need to be in both model parts (logistic + Poisson)?
No - you can choose to have different predictors in each part of the model, or to use some predictors in both model parts, and other predictors only in one of them
In a ZIP model, how are structural zeroes modelled?
Using a logistic regression
In a ZIP model, if a coefficient is positive in the logistic part, what does that indicate?
More zeroes - smaller count of outcome variable (corresponds to a negative coefficient in the Poisson part)
In a ZIP model, how are both model parts related?
They are both dependent on one another - changing something in the zero-inflation part will change the estimates in the Poisson part and vice versa
How does ZI negative binomial regression work in relation to a ZI poisson model?
In essentially the same way, except that the counts are assumed to follow a negative binomial distribution
In a ZIP model, how are coefficients interpreted? Consider an example IRR of 0.89 (95% CI 0.80-0.98) corresponding to a 10 percentage point difference in the proportion of lower class difference in relation to police operations
As in an ordinary Poisson regression, but being mindful that our estimates are conditional on how we adjust for zero-inflation (e.g., if we change predictors in the ZI part, the coefficient estimates in the Poisson part will also change. For example: “Our model estimates that a 10% percentage point difference in the proportion of lower class citizens is associated with fewer police operations by a factor of 0.89, adjusting for zero-inflation, where zeroes are predicted by lower10, vendors, and population.”
How do you interpret the logistic part in a ZIP? Interpret in relation to police operations
The logistic part predicts zeroes. Thus:
- An OR > 1 indicates that a predictor is associated with more zeroes (fewer police operations)
- An OR < 1 indicates a predictor is associated with fewer zeroes (more police operations)
Interpretation needs to consider how we model the non-zero counts, i.e., the estimates from the logistic part are adjusted for the Poisson part.
How are count regression models estimated?
By maximum likelihood
What models can be compared using LRTs?
Models of the same type (e.g., negative binomial, ZIP) can be compared using LRTs
But:
- models without zero-inflation are not nested within zero-inflated models
- although the Poisson model is nested within the negative binomial model (as it is a special case of negative binomial regression), the ordinary LRT cannot be applied
Is a Poisson model nested within a negative binomial regression?
Yes - it is nested within a negative binomial regression with the same predictors, because if the dispersion α = 0, then the two models are identical
But the ordinary LRT would give misleading results (the p-value would be too large and we would reject H0 too rarely)
Why can we not use an ordinary LRT with a Poisson and negative binomial regression model?
α cannot be negative - so the test value 0 is “on the boundary of the parameter space.”
We therefore need to use a special test called the boundary LRT
Hypotheses for the boundary LRT to compare Poisson and negative binomial models:
H0: α = 0
Or: “There is no overdispersion”
Or: “The NegBin model is no better than the Poisson model.”
H1: α > 0
Or: “There is overdispersion”
Or: “The NegBin model is better than the Poisson model”
A small p-value indicates evidence in favour of overdispersion, i.e., evidence against the Poisson model
Where is the boundary LRT displayed in Stata?
In the output of a NegBin model alongside <chibar2(01)> for α = 0
It is displayed by default (compares negative binomial regression with Poisson regression model with the same predictors)
What statistic does boundary LRT use?
Same as ordinary LRT, but calculates p-value in a different way
If one variable is overdispersed, does that necessarily mean that overdispersion also exists in models of the same data?
No - sometimes additional predictors explain away the overdispersion
Why can LRTs not be used to assess whether zero-inflation improves a model?
Models without zero-inflation are not nested in models with zero-inflation
We can use Akaike’s Information Criterion (AIC) instead
What is AIC?
Can be used to compare nested and non-nested models
Is an information criterion, not a statistical test
How is AIC expressed?
AIC = -2 x LL + 2 x k
Where:
- LL: Log likelihood
- k: number of parameters
What does AIC do?
Weighs model fit (represented by the LL) against parsimony (represented by k)
How to interpret AIC:
A smaller AIC indicates a better model
The numeric value of the AIC has no meaning in itself; it is meaningful only when comparing the AICs of different models estimated on the same data (with the same sample size)
How many additional parameters does negative binomial regression have in relation to the other models encountered?
One (dispersion)
What sentence needs to be included when interpreting zero-inflated models (logistic part)?
“adjusted for all other variables in the model, including both the Poisson and the logit parts”
What are differences in the estimates from ZINB and ZIP model?
Estimates from ZIP and ZINB models are similar, but different; different assumptions -> different results
SEs are larger in the ZINB model, leading to wider CIs
AIC - notes:
- AIC has general applicability beyond count regression
- Can be used to compare nested or non-nested models
- AIC is one of several information indices: different information indices vary in the way they weight model fit and parsimony
Zero-inflated models - summary:
- Excess zeroes can sometimes be accounted for by explanatory variables
- Poisson and negative binomial models are not nested in their zero-inflated counterparts
- Use AIC (or other information criteria) for comparison