W4: Poisson Regression Flashcards
What if we are interested in a response variable (which is a count) and its association with more than one binary covariate
Which test do we perform?
Poisson and negative bionominal regression
When is Poisson regression suitable?
When we have a response variable in form of count and one or more IV (covariates) which can be discrete or continous
Form of Poisson regression equation
In Poisson Regression, M in the equation stands for?
Mean count for the individual with covariates values X1, X2…
Once we have values of a, b1, b2 we can use Poisson regression equation to make predicts by reversing the log transformation
Prediction:
Assumptions of Poisson Regression (3)
- Observations are independent
- Disturbition of counts follow a Poisson disturbition
- The mean and variance of the model as the same
Which assumption of Poisson regression is often violated?
Assumption 3: Mean and variance of the model are the same
Assumption 3 of the Poisson regression is often violated
This is when
variance of the model is larger than mean –> this is known as overdispersion
To assess whether a model is overdispered (Poisson reh) is to look at the
Chi-squared test statistic x^2 divided by model’s DF
If Chi-squared (x^2)/DF = 1 then
Poisson/NB
mean and variance of the model are equal
If Chi-squared (x^2)/DF > 1 then
Poisson/ NB regression
A value much larger than 1 indicated overdispersion
Assessing model fit using (2)
Poisson/NB reg
- AIC value
- Omnibus test
What does the Omnibus test the null hypothesis that:
Poisson/NB test
HO = The model is no better than an intercept-only model
In other words, what H0 the Omnibus test test?
Poisson/NB
Our Poisson regression with our X variables used to predict this count is no better than guessing.
If the Omnibus test is significant then -
Poisson/NB
The model is superior to a null model
Testing significance of predictors using __ test in Poisson regression
Wald
Testing significance of predictors using Wald test in Poisson regression
We test the null hypothesis that:
- First thing to write up in Poisson regression:
A Poisson regression was performed to assess whether X2(, e.g., Spend on campagin ) and X2 (e.g., Emotiveness of advertisement) affected/predicted the number of Y (e.g., number of clicks)
Step 2 of Poisson is writing regression equation (in SPSS) - (2)
The regression equation took the form of:
log (Mean number of clicks) = 3.0414 + 0.372 (Spend) - 0.717 (Emotive)
Step 3 of Poisson signifiance of covariates (in SPSS)
Both Spend and Emotive are significant at 0.1% level as predictors of Number of Clicks, and therefore we retain both covariates in the model
Step 2 of signifiance of covariates in Poisson regression, what if one covariate is not significant at any level?
Therefore this should be removed and refit the model again
Step 2 and 3 of Poisson regression of writing equation and sig of covariates in R
Step 3 of Poisson regression: Testing for overdispersion (in SPSS) from goodness of fit table - (2)
- From the goodness of fit table, we see the Chi-squared test statistic divided by its DF is 57.623.
- This is much larger than 1 and therefore we have very strong evidence of overdispersion in the data
Step 3 of Poisson regression: Testing for overdispersion (in R) from goodness of fit table - (2)
Divide 15.295/16
Step 4 of Poisson is Omnibus test (in SPSS) - (3)
- The Omnibus test is significant at 0.1% level
- So model is superior to null model
- Including X1 (e.g., Stress) and X2 (e.g., Distance) to predict Y (e.g., Sick days) is better than a null model
There is no Omnibus test in
Poisson/ Negative NB
R
Step 5 of Poisson Regression is writing the AIC (in SPSS) - (2)
The AIC for this model is 1148.239
Cna be used to compare to other models
Step 5 of Poisson Regression is writing the AIC (in R)
If we have overdispersion in the data, then Poisson regression is not suitable
Instead we use:
Negative bionominal regression
If we have lots of zero counts then zero-inflated Poisson regression is
suitable
The negative bionominal regression equation is same in (5)
- The steps of Poisson regression
- Same interpreting AIC value and overdispersion
- Same in extracting values for equation
- Same regression equation as Poisson regression
- Same assumptions
In negative bionominal regression we only need to compare the
AIC value to other model (i.e., Poisson regression model)
What is AIC?
Negative / Poisson Reg
Measure of goodness-of-fit of regression models
What is the preferred model based on AIC?
Is one with lowest AIC value
Assumptions of negative bionominal regression are: (2)
Observations are independent
The counts follow a negative bionominal disturbition
Two assumptions of negative bionominal regression is satisfied
when creating scatterplot of predicted values against residuals
If negative bionominal regression scatterplot look like this then - (2)
We see a pattern here: variance decreases (range of points lower/less spread of points in residuals) as predicted values increase
This provides evidence against our assumptions
Check indepndence observation in Poission regression by
plotting a scatterplot of predicted values against residuals
Checking indepndence assumption of Poisson in this graph in R, interpret it - (2)
- we don’t have independence in our observations so assumption of independence might not be met as we have got this big gap in the middle of this plot
- not obvious why there is this gap