count data Flashcards
what distribution does count data typically present?
poisson distribution
what are the two types of count data?
- Count data – number of events
- Rate data – number of events reported per year
prior assumptions for poisson regression?
- Mean and variance is the same
- If true the scaling parameter (Residual deviance/degrees of freedom) should be close to 1
what distribution should we use if the scailing parameter is 1 vs if its large
poisson distribution
if large, negative binomial distribution
what is a scailing parameter?
Residual deviance/degrees of freedom
in conut data, what do we use to measure the treatment effect
Incidence rate ratios
what type of count data is used for poission regression
both count and rate data
interpret: poisson regression. outcome variable numebr of AE’s reported
IRR = 0.984
the Support group has 0.98 less AEs reported than the Active
IRR = 0.984, shows the decreased number of AE’s in treatment group
how can we reframe this to show the increased number of AE’s in control?
1/0.984 = 1.016, control had 1.02 more reported AEs than treatment group
what is the Equivariance property in poission regression?
Poisson regression has a unique property. The Mean of a poission distributed variable must be equal to its variance
before we fit the non-negative discrete outcome to the regression model what must we do?
Transform the variable.
Transform the variable. take natural log of the outcome
why transform
Because non negative discrete data tends to give us a positive skew. We want the outcome normally distributed. Transforming the variable gives it a normal distribution.
Second reason – for the data to take on a linear form. Before it only takes on values 0 or 1 but now can take a range of deccimals even between that. Just puts it on a linear scale
when we conduct poisson regression and after having calculated the coefficients, what do we need to do to them?
Transform them back using eulers number (exponential function)
Need to do this to interpret them
e(a + b)
e(a x b)
e(a - b)
e(a + b)
–> e^a x e^b
e(a x b)
–> (e^a )b OR (e^b a
e(a - b)
e^a / e^b
what is a consequence of doing poisson regression without transforming the outcome first?
Might make a negative prediction
Imaging predicting the number of events in 1 group and making a negative prediction… honey this don’t make sense!
You hvave a coutn outcome. We want to see the relationship math score number of awards achieved. Write regression equation for this
Interpret the intercept and slope?
Ln(y) = B0 + B1X1
The intercept is the natural log of the expected count when math score was 0.
the slope is X times the intercept. this equals the nuber of awards for a single unit increase in math score.
how do we transform the coefficients back to their original scale?
exponentiate them
e(ln(Y)) = e(B0 + B1*x)
= e(b0) * e(b1 * X)
= e(b0) * e(b1)x
what is 0.443 raised to the power of 0
because anything raised to the power of 0 is 1.
Y = e(b0) * e(b1)x
Y = e(0.22) * e(0.778)pass_maths
Interpret the B1 coefficient here using the value?
So an increase of 1 in exercise score then the expected count is e(0.778) times higher than B0.
coutn outcome (number of awards). single binary predictor (pass/fail maths)
exponentiated intercept = 0.175
exponentiated coefficient = 5.33
Interpret this output
If a subject failed maths, then the expected number of awards they would achieve is 0.175. however for those that passed, the expected number of awards is 5.33 [CI 3.172, 0.283] times higher than 0.175. The difference between the two groups is statistically significant (p <.001).
coutn outcome (number of awards). single continuous outcome (math score)
exponentiated intercept = 0.005
exponentiated coefficient =1.090
Interpret this output
If a student had exactly a score of 0 for maths, the eqpected number of awards is 0.005, for a single unit increase in maths score, the expected number of awards is 1.9 [CI 1.07, 1.11] times greater than 0.005.
coutn outcome (number of awards). single continuous outcome (math score)
exponentiated intercept = 0.005
exponentiated coefficient =1.090
write the regression equation for the predicts score of someone with a math score of 25
ln(y) = b0 + B1
ln(y) = -5.334 + 0.086*math score
y = e(b0+1B1)
Y = e(-5.334) * e(0.086)maths score
Y = 0.005 * 1.09^maths score
What if we wanted to know what a 5 unit increase in score would have on the number of expected awards?
Original equation: Y = 0.005 * 1.09maths score
y = e(b0+B1*mathscore)
= 1.09^5
what do we have to do differently to the analysis if we have a nominal predictor e.g., program type on student performance?
Any time we have categorical predictors in the model, we need to use dummy variables.
Rate outocme. What is an offset term
When comparing overall coutns of covid across different countries it wouldn’t be fair to use these numbers alone to determine which had more/less cases. Neglecting the underlying population. Need to account for this – this is the offset term.
Like a secret demonimator within the data that needs to be accouted for.
This could me something like population or time.
what is equal to the relative risk in the output of rate data. E.g, klooking at the rate outcoe of covid for men vs women
Slope of B1 can be interpreted as the relative risk of one group having event vs another. Because in when detecting the rate (number of events/total pop) we are getting the risk for group 1. Do this for group 2 and we get the risk of them getting the COVID.
how can we measure goodness of fit?
- Deviance
- pearson goodness of fit
- AIC
- BIC
- Or can do this with automatic selection methods: forward, backward and stepwise selection