count data Flashcards by Sarah Ashmori

what distribution does count data typically present?

poisson distribution

How well did you know this?

Not at all

Perfectly

what are the two types of count data?

Count data – number of events
Rate data – number of events reported per year

How well did you know this?

Not at all

Perfectly

prior assumptions for poisson regression?

Mean and variance is the same
If true the scaling parameter (Residual deviance/degrees of freedom) should be close to 1

How well did you know this?

Not at all

Perfectly

what distribution should we use if the scailing parameter is 1 vs if its large

poisson distribution

if large, negative binomial distribution

How well did you know this?

Not at all

Perfectly

what is a scailing parameter?

Residual deviance/degrees of freedom

How well did you know this?

Not at all

Perfectly

in conut data, what do we use to measure the treatment effect

Incidence rate ratios

How well did you know this?

Not at all

Perfectly

what type of count data is used for poission regression

both count and rate data

How well did you know this?

Not at all

Perfectly

interpret: poisson regression. outcome variable numebr of AE’s reported

IRR = 0.984

the Support group has 0.98 less AEs reported than the Active

How well did you know this?

Not at all

Perfectly

IRR = 0.984, shows the decreased number of AE’s in treatment group

how can we reframe this to show the increased number of AE’s in control?

1/0.984 = 1.016, control had 1.02 more reported AEs than treatment group

How well did you know this?

Not at all

Perfectly

what is the Equivariance property in poission regression?

Poisson regression has a unique property. The Mean of a poission distributed variable must be equal to its variance

How well did you know this?

Not at all

Perfectly

before we fit the non-negative discrete outcome to the regression model what must we do?
Transform the variable.

Transform the variable. take natural log of the outcome

How well did you know this?

Not at all

Perfectly

why transform

Because non negative discrete data tends to give us a positive skew. We want the outcome normally distributed. Transforming the variable gives it a normal distribution.
Second reason – for the data to take on a linear form. Before it only takes on values 0 or 1 but now can take a range of deccimals even between that. Just puts it on a linear scale

How well did you know this?

Not at all

Perfectly

when we conduct poisson regression and after having calculated the coefficients, what do we need to do to them?

Transform them back using eulers number (exponential function)
Need to do this to interpret them

How well did you know this?

Not at all

Perfectly

e(a + b)
e(a x b)
e(a - b)

e(a + b)
–> e^a x e^b

e(a x b)
–> (e^a )b OR (e^b a

e(a - b)
e^a / e^b

How well did you know this?

Not at all

Perfectly

what is a consequence of doing poisson regression without transforming the outcome first?

Might make a negative prediction
Imaging predicting the number of events in 1 group and making a negative prediction… honey this don’t make sense!

How well did you know this?

Not at all

Perfectly

You hvave a coutn outcome. We want to see the relationship math score number of awards achieved. Write regression equation for this

Interpret the intercept and slope?

Study These Flashcards

Ln(y) = B0 + B1X1

The intercept is the natural log of the expected count when math score was 0.
the slope is X times the intercept. this equals the nuber of awards for a single unit increase in math score.

how do we transform the coefficients back to their original scale?

Study These Flashcards

exponentiate them

e(ln(Y)) = e(B0 + B1*x)

= e(b0) * e(b1 * X)

= e(b0) * e(b1)x

what is 0.443 raised to the power of 0

Study These Flashcards

because anything raised to the power of 0 is 1.

Y = e(b0) * e(b1)x

Y = e(0.22) * e(0.778)pass_maths

Interpret the B1 coefficient here using the value?

Study These Flashcards

So an increase of 1 in exercise score then the expected count is e(0.778) times higher than B0.

coutn outcome (number of awards). single binary predictor (pass/fail maths)

exponentiated intercept = 0.175
exponentiated coefficient = 5.33

Interpret this output

Study These Flashcards

If a subject failed maths, then the expected number of awards they would achieve is 0.175. however for those that passed, the expected number of awards is 5.33 [CI 3.172, 0.283] times higher than 0.175. The difference between the two groups is statistically significant (p <.001).

coutn outcome (number of awards). single continuous outcome (math score)

exponentiated intercept = 0.005
exponentiated coefficient =1.090

Interpret this output

Study These Flashcards

If a student had exactly a score of 0 for maths, the eqpected number of awards is 0.005, for a single unit increase in maths score, the expected number of awards is 1.9 [CI 1.07, 1.11] times greater than 0.005.

coutn outcome (number of awards). single continuous outcome (math score)

exponentiated intercept = 0.005
exponentiated coefficient =1.090

write the regression equation for the predicts score of someone with a math score of 25

Study These Flashcards

ln(y) = b0 + B1

ln(y) = -5.334 + 0.086*math score

y = e(b0+1B1)

Y = e(-5.334) * e(0.086)maths score

Y = 0.005 * 1.09^maths score

What if we wanted to know what a 5 unit increase in score would have on the number of expected awards?

Study These Flashcards

Original equation: Y = 0.005 * 1.09maths score

y = e(b0+B1*mathscore)

= 1.09^5

what do we have to do differently to the analysis if we have a nominal predictor e.g., program type on student performance?

Study These Flashcards

Any time we have categorical predictors in the model, we need to use dummy variables.

Rate outocme. What is an offset term

When comparing overall coutns of covid across different countries it wouldn’t be fair to use these numbers alone to determine which had more/less cases. Neglecting the underlying population. Need to account for this – this is the offset term. Like a secret demonimator within the data that needs to be accouted for. This could me something like population or time.

what is equal to the relative risk in the output of rate data. E.g, klooking at the rate outcoe of covid for men vs women

Slope of B1 can be interpreted as the relative risk of one group having event vs another. Because in when detecting the rate (number of events/total pop) we are getting the risk for group 1. Do this for group 2 and we get the risk of them getting the COVID.

how can we measure goodness of fit?

- Deviance - pearson goodness of fit - AIC - BIC - Or can do this with automatic selection methods: forward, backward and stepwise selection

count data Flashcards

(27 cards)