Models for Count Data I Flashcards

1
Q

What are count variables?

A

Count variables are discrete and take non- negative integer values (0, 1, 2, …) and represent the number of occurrences of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give three examples of count variables:

A

Number of hospital visits, number of deaths by horse kick, and number of appointments with a counsellor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What must be considered when measuring count variables over different time periods or populations?

A

Counts should be adjusted using a rate (e.g., number of crimes per 100,000 people)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the formula for the Poisson probability mass function?

A

P ( Y = y ) = (μ^y^e − μ) / y!
- μ is the expected (mean) count (mean number of times that event occurs)
- Let Y be a random (count) variable that indicates the number of times a certain event occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a key property of the Poisson distribution?

A

The mean and variance are equal (equidispersion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens the mean μ of a Poisson distribution is large?

A

It approximates a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the general form of a Poisson regression model?

A

log(μi) = β0 + β1X1i + … + βpXpi
- Where μi is the expected count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we use a log transformation in a Poisson regression?

A

To ensure predicted counts are always positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What assumptions must be met for Poisson regression?

A
  • The outcome is a count variable (non-negative integers)
  • The variance equals the mean (no overdispersion). This implies heteroscedascity (different to what’s seen in the normal distribution): the predicted variance depends on the predicted mean.
  • Observations are independent (e.g., no clustering)
  • The transformed outcome (log(μ)) is linearly related to continuous predictors
  • No multicollinearity
  • Each subject’s count is measured over the same unit of time or space, or the same population size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is overdispersion in Poisson regression?

A

When the variance is larger than the mean, suggesting a need for a different model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can cause overdispersion?

A
  • Excess zeroes (zero-inflation)
  • An important predictor is missing
  • A highly skewed count variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What models can be used to handle overdispersion?

A

Negative binomial regression and zero-inflated Poisson models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we interpret a coefficient β in Poisson regression?

A

The exponentiated coefficient e^β represents the incident rate ratio (IRR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does an IRR indicate?

A
  • IRR = 1: No effect of predictor
  • IRR > 1: Predictor increases the outcome rate
  • IRR < 1: Predictor decreases the outcome rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we test for the significance of variables in Poisson regression?

A

Using an LR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an offset in Poisson regression?

A

A term added to account for different observation periods, population sizes, or area sizes
This may involve devising a rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we include an offset in Stata?

A

poisson <outcome> <predictor(s)>, exposure(offset_variable)</outcome>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an example of using an offset?

A

Analysing the number of crimes per 1,000 residents rather than total crimes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What command is used for a basic Poisson regression in Stata?

A

poisson <outcome> <predictor(s)></outcome>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do we check if a Poisson model fits the data well?

A

Compare observed vs. predicted counts using the prcounts command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we test for overdispersion?

A

Compare a Poisson model with a negative binomial model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Among the numeric variables, what two types can be established?

A
  • Continuous: e.g., age, height, blood pressure, etc. They can take the form of fractions
  • Discrete: e.g., number of siblings, number of hospital visits, etc. i.e., things you can actually count
    Some variables are strictly speaking discrete, but in practice can be treated as continuous such as household income
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What chart do we use to display count variables?

A

Bar (not histogram)
- Each bar represents one number
- Spaces between bars because of discrete values, not continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What statistical distribution can we use to model count variables?

A

Poisson distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does the Poisson distribution specify?

A

The relationship between the expected count μ and the probability of observing any observed count y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the notation to denote that Y follows a Poisson distribution with mean μ?

A

Y ~ Poisson(μ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Describe the Poisson distribution with μ = 1

A

Most of the counts will be zero or 1, and higher counts than 1 are rarer. Distribution is highly skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Describe the Poisson distribution with μ = 4

A

Mostly seeing 3s and 4s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Describe the Poisson distribution with μ = 10

A

Distribution appears normal and symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does the shape of the distribution depend on?

A

The μ parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What happens to the variance as μ increases?

A

As μ = variance, as the mean increases, so does the variance

32
Q

Properties of the Poisson distribution:

A
  • One parameter, μ, which is equal to the mean and variance (equidispersion)
  • Positive skew - although the shape of the distribution depends on the mean
33
Q

What is the probability density function of the normal distribution?

A

f(y) = 1 / 2 √ πσ2^e( y - μ)2 / 2σ2

34
Q

What are the properties of the normal distribution?

A

Two parameters: the mean μ and variance σ2, which are independent of one another

35
Q

What is the notation that denotes “Y is normally distributed with mean μ and variance σ2?

A

Y ~ N ( μ , σ2 )

36
Q

How can we initially check whether the Poisson distribution might be an adequate model for our outcome?

A

Graphical comparison of observed proportions and Poisson predicted probabilities, using the observed mean of Y as an estimate of μ

37
Q

What kind of numbers do the count and mean have to be?

A

Count has to be a whole number, but the mean doesn’t have to be

38
Q

Why may our observation not follow the Poisson distribution?

A
  • Overdispersion: The variance is larger than the mean. This is a frequent occurrence in practice e.g., length of hospital stay (in days) typically has a long tail to the left
  • Excess zeroes: Observe more zero counts than expected by the Poisson for a given mean e.g., number of accidents in a workplace (more days have no accidents)
  • No zeroes (zero-truncation): There are no zeros in the data by design, or because we had no chance to observe them e.g., number of appointments of psychotherapy clients with their therapist (you only become a client after attending the first appointment, so we don’t observe clients with zero appointments)
39
Q

How may overdispersion be distributed?

A

Fatter tails on both sides

40
Q

How was SARS-CoV-2 an example of overdispersion?

A

R number - modelling the Poisson regression, distribution would be overdispersed as most people would have stayed at home to reduce infection, resulting in a smaller amount of people infecting many others.
Variance would have been much higher than the mean

41
Q

What is the equation for the Poisson regression model?

A

log(μi) = β0 + β1x1i + β2x2i + … + βpxpi
log(μi) - can model log of the mean for the ith case
Same righthand side as other hitherto seen regression models (linear predictor)

42
Q

How can we rewrite the Poisson regression equation for the mean rather than the log of the mean?

A

Exponentiate both sides:
μi = exp(β0 + β0xi1 + β0xi1 + … + βpxip)
The log-transformation ensures that our predicted counts from Poisson distribution, μi, cannot be negative (to be a count variable, there cannot be any negative values)

43
Q

What should you do if you have overdispersion and/or excess zeroes or no zeroes?

A

If assumptions of Poisson regression aren’t met, it’s generally advisable to use another model.
Sometimes, residual overdispersion or excess zeroes may result from failure to include an importance predictor

44
Q

Why is using linear regression not advisable for count outcomes?

A
  • Can result in negative predicted counts i.e., non-sensical predictions such as -3 appointments with a therapist
  • Count data often violate assumption of homoscedasticity (Poisson regression assumes heteroscedasticity)
  • The log transformation used in Poisson regression often gives better predictions
45
Q

When may linear regression be appropriate for count outcomes?

A

When the mean of the count outcome is large (Poisson distributions with a large mean look similar to a normal distribution with the same mean)

46
Q

What is homoscedascticity in linear regression?

A

The variance of residuals is the same at each value of x no mater the mean of linear regression. The scatter of points of x = 30 is the same as x = 50

47
Q

What is heteroscedasticity in Poisson regression?

A

E.g., at x = 50, there is much wider distribution and more scatter between points compared to x = 30

48
Q

As with other models, what should you do to continuous variables before analysis?

A

Centre them

49
Q

On what assumption are hypothesis tests and confidence intervals for coefficients based?

A

The coefficients are normally distributed - called the “normal approximation”
This is realistic for large samples, and when the Poisson model assumptions are met

50
Q

What is routinely displayed when conducting Poisson regression?

A

A z-test of H0: β = 0
Where:
- z = β^ / SE^ - β^ is the estimated coefficient, and SE^ is the estimated standard error
- The p-value provides information about the strength of the evidence against H0

51
Q

How is the 95% CI calculated?

A

β^ ± 1.96 x SE^

52
Q

How can we obtain a more interpretable indicator of the effect of an IV?

A

By exponentiating the coefficients (back-transforming from the log-scale to the scale of the count variable). The same can be done for 95% CIs for the raw coefficients
The exponentiated coefficients are called incidence rate ratios (IRRs) or rate ratios (RRs)

53
Q

For continuous predictors, the size of the IRR depends on what?

A

The scale on which I measure the predictor. The scale can make the effect look large or small, or more or less meaningful
E.g., an additional 1,000 people in the population may not make a big difference to the outcome, but a different of 10,000 citizens might

54
Q

If the population is in 1000s, and I want to get an IRR (IRR = 1.007) associated with a 10,000 difference in population, what could I do?

A

Either:
- Take the IRR for population in 1000s and take the 10th power: IRR_10k = IRR_1k^10 = 1.007^10 = 1.072. This means a 10,000 population difference is associated with about a 7.2% higher number of the outcome.
Or:
- To get the IRR_10k via software, I could recode my population variable so that it measures population in tens of thousands, such that: pop10K = population1000/10
and use pop10k as my predictor

55
Q

Does scaling change the predictions?

A

No, just that the IRR is on a different scale

56
Q

What decides the extent to which a variable should be scaled?

A

The context (no particular formula)

57
Q

What is it helpful to do at the start of the analysis?

A

Code variables at the start to have them scaled before analysis

58
Q

How are coefficient estimates for Poisson regression found?

A

Via maximum likelihood estimation
Every model has a likelihood, and in general nested models are compared using LRTs

59
Q

In what ways are methods for model comparison (e.g., LRTs) useful?

A
  • Single test of a hypothesis about multiple predictors
  • Single test of several dummy variables
  • Tests of interaction effects
    When models are nested, LRTs can be used (although in case of testing multiple variables, the test cannot tell which of those variables are redundant in predicting the outcome)
60
Q

When doing an LRT, how should you formulate your H0?

A

In respect to the study’s specific context

61
Q

What are the degrees of freedom in an LRT equal to?

A

The number of additional parameters in the larger model

62
Q

In the output, what would a coefficient of 0.1 mean?

A

Since interpretation is in rates (not absolute changes like in other models), a coefficient of 0.1 means a 10.5% increase in the rate (e^0.1 = 1.105)

63
Q

Logistic regression transforms probabilities, what does Poisson regression transform?

64
Q

Under H0, the LRT statistic has what kind of distribution?

A

Chi-square

65
Q

In an LRT, are models still nested even with addition of 2 variables?

A

Yes - by setting these two coefficients to zero, we get the smaller model

66
Q

What’s the issue with directly comparing the yearly number of cyclist deaths per city with other cities?

A

The comparison may be biased - we may adjust the analysis by a suitable indicator of cycling frequency e.g., total number of miles cycled in each city per year

67
Q

If we were to consider a model for eye tests/neighbourhood deprivatin using an offset of population size per 1,000 people, what would that look like?

A

log(mean no. of eye tests / population size / 1000) = β0 + β1X1 + β2X2
The outcome is a rate: number of eye tests per 1,000 people

68
Q

Why is modelling rates using a Poisson distribution problematic and how can we mitigate this? Consider the eye test/neighbourhood deprivation example

A

Rates are not necessarily integers, so are not modelled well by a Poisson distribution
The solution is to use algebra and put the population size on the right-hand side of the equarion:
log(mean eye tests) = β0 + β1X1 + β2X2 + log(pop.size/1000) - this is the offset
This makes use of the result that log (a/b) = log(a) - log(b)

69
Q

How does an offset function?

A

Like an additional variable in the equation, but the coefficient is set to 1 and not estimated (way of redefining the outcome to ensure we can use Poisson regression)
Offsets can be readily incorporated into count regression models using standard software

70
Q

Why might we decide to divide the population size by 1,000? Reference to the eye-test/neighbourhood deprivation example

A

Although we could use log(pop.size) instead and slope coefficients, IRRs & SEs would be the same. Dividing by 1,000 gives:
- A more sensible intercept (log predicted number of eye tests in areas with IMD I per 1,000 people, rather than per person)
- More sensible predicted counts (predicted number of eye tests per 1,000 people, rather than per person)

71
Q

Is it necessary to include the offset in the table?

A

No, but should state somewhere that it is being used

72
Q

What’s the place of log transformation in Poisson regression?

A

Log transformation (log link): Poisson regression relates the logarithm of the mean count to a set of predictors. This implies a curvilinear between numeric predictors and the count outcome

73
Q

On what scale are slope coefficients?

A

Log-scale, but can be exponentiated to give rate ratios

74
Q

How are estimates found?

A

Maximum likelihood, and LRTs can be used to compare nested models