count data Flashcards

1
Q

what distribution does count data typically present?

A

poisson distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the two types of count data?

A
  • Count data – number of events
  • Rate data – number of events reported per year
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

prior assumptions for poisson regression?

A
  • Mean and variance is the same
  • If true the scaling parameter (Residual deviance/degrees of freedom) should be close to 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what distribution should we use if the scailing parameter is 1 vs if its large

A

poisson distribution

if large, negative binomial distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a scailing parameter?

A

Residual deviance/degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

in conut data, what do we use to measure the treatment effect

A

Incidence rate ratios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what type of count data is used for poission regression

A

both count and rate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

interpret: poisson regression. outcome variable numebr of AE’s reported

IRR = 0.984

A

the Support group has 0.98 less AEs reported than the Active

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IRR = 0.984, shows the decreased number of AE’s in treatment group

how can we reframe this to show the increased number of AE’s in control?

A

1/0.984 = 1.016, control had 1.02 more reported AEs than treatment group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the Equivariance property in poission regression?

A

Poisson regression has a unique property. The Mean of a poission distributed variable must be equal to its variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

before we fit the non-negative discrete outcome to the regression model what must we do?
Transform the variable.

A

Transform the variable. take natural log of the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why transform

A

Because non negative discrete data tends to give us a positive skew. We want the outcome normally distributed. Transforming the variable gives it a normal distribution.
Second reason – for the data to take on a linear form. Before it only takes on values 0 or 1 but now can take a range of deccimals even between that. Just puts it on a linear scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when we conduct poisson regression and after having calculated the coefficients, what do we need to do to them?

A

Transform them back using eulers number (exponential function)
Need to do this to interpret them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

e(a + b)
e(a x b)
e(a - b)

A

e(a + b)
–> e^a x e^b

e(a x b)
–> (e^a )b OR (e^b a

e(a - b)
e^a / e^b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a consequence of doing poisson regression without transforming the outcome first?

A

Might make a negative prediction
Imaging predicting the number of events in 1 group and making a negative prediction… honey this don’t make sense!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You hvave a coutn outcome. We want to see the relationship math score number of awards achieved. Write regression equation for this

Interpret the intercept and slope?

A

Ln(y) = B0 + B1X1

The intercept is the natural log of the expected count when math score was 0.
the slope is X times the intercept. this equals the nuber of awards for a single unit increase in math score.

17
Q

how do we transform the coefficients back to their original scale?

A

exponentiate them

e(ln(Y)) = e(B0 + B1*x)

= e(b0) * e(b1 * X)

= e(b0) * e(b1)x

18
Q

what is 0.443 raised to the power of 0

A

because anything raised to the power of 0 is 1.

19
Q

Y = e(b0) * e(b1)x

Y = e(0.22) * e(0.778)pass_maths

Interpret the B1 coefficient here using the value?

A

So an increase of 1 in exercise score then the expected count is e(0.778) times higher than B0.

20
Q

coutn outcome (number of awards). single binary predictor (pass/fail maths)

exponentiated intercept = 0.175
exponentiated coefficient = 5.33

Interpret this output

A

If a subject failed maths, then the expected number of awards they would achieve is 0.175. however for those that passed, the expected number of awards is 5.33 [CI 3.172, 0.283] times higher than 0.175. The difference between the two groups is statistically significant (p <.001).

21
Q

coutn outcome (number of awards). single continuous outcome (math score)

exponentiated intercept = 0.005
exponentiated coefficient =1.090

Interpret this output

A

If a student had exactly a score of 0 for maths, the eqpected number of awards is 0.005, for a single unit increase in maths score, the expected number of awards is 1.9 [CI 1.07, 1.11] times greater than 0.005.

22
Q

coutn outcome (number of awards). single continuous outcome (math score)

exponentiated intercept = 0.005
exponentiated coefficient =1.090

write the regression equation for the predicts score of someone with a math score of 25

A

ln(y) = b0 + B1

ln(y) = -5.334 + 0.086*math score

y = e(b0+1B1)

Y = e(-5.334) * e(0.086)maths score

Y = 0.005 * 1.09^maths score

23
Q

What if we wanted to know what a 5 unit increase in score would have on the number of expected awards?

A

Original equation: Y = 0.005 * 1.09maths score

y = e(b0+B1*mathscore)

= 1.09^5

24
Q

what do we have to do differently to the analysis if we have a nominal predictor e.g., program type on student performance?

A

Any time we have categorical predictors in the model, we need to use dummy variables.

25
Q

Rate outocme. What is an offset term

A

When comparing overall coutns of covid across different countries it wouldn’t be fair to use these numbers alone to determine which had more/less cases. Neglecting the underlying population. Need to account for this – this is the offset term.

Like a secret demonimator within the data that needs to be accouted for.

This could me something like population or time.

26
Q

what is equal to the relative risk in the output of rate data. E.g, klooking at the rate outcoe of covid for men vs women

A

Slope of B1 can be interpreted as the relative risk of one group having event vs another. Because in when detecting the rate (number of events/total pop) we are getting the risk for group 1. Do this for group 2 and we get the risk of them getting the COVID.

27
Q

how can we measure goodness of fit?

A
  • Deviance
  • pearson goodness of fit
  • AIC
  • BIC
  • Or can do this with automatic selection methods: forward, backward and stepwise selection