Models for Count data Flashcards

1
Q

Count variable

A

An ordinal variable that takes non-negative and discrete values: 0, 1, 2, 3, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of models that can be used on count data:

A

1) Poisson regression
2) Negative binomial regression
3) Truncated poisson/negative binomial regression
4) Zero inflated poisson/negative binomial regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Examples of count variables:

A
  • # of cars owned
  • # of drinks consumed at festival
  • # of products returned
  • # of complaints
  • # of stocks owned
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why not standard regression OLS?

A

Cause it assumes a normal distribution, but also:

- often very low mean (> 10, it would be appropriate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Poisson distribution

A
  • with parameter landa
  • uses on parameter, since mean=variance
  • if landa > 10, probably normal distribution so use OLS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Poisson regression analysis

A
  • DV = Count variable
  • Goal to explain DV by set of Xi
  • Each Yi is randomly drawn from a poisson distribution
  • Mean = variance = landa
  • Outcome metric is related to Xi via link function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Link function

A

Yi = exp(XiB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model estimation, via..

A

Maximum likelihood estimation, the parameters will be estimated by searching for those parameters values that give the highest likelihood to observe the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretation of the parameters

A
  • If Xi changes with one unit (keeping all constant) the expect count (landa) is multiplied by exp(b1)
  • Or.. if you use the LN of a Xi, parameter becomes elasticity. So increase in X leads to a % increase in landa (DV)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model fit and selection

A

1) Likelihood
2) Likelihood ratio test -> Only for nested models
3) AIC, BIC, CAIC -> models may differ, however data should be the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Same violations or “cHaLLaNgEs”

A

1) Mean not equal to the variance
2) Zero events cannot be observed
3) More zeros then expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Under dispersion

A

Mean > variance

Almost never happens. If so still use poisson regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Over dispersion

A

Mean < variance

Use negative binomial regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dispersion test

A

If significant, assumption is violated. So the poisson distribution is not suitable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What to do if Zero events cannot be observed?

A

Use a truncated model. For the truncated negative binomial regression the second intercept represents the extra variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is observing Zeros not possible?

A

1) Basket size of online orders
2) Relationship length
3) Household size

17
Q

What to do if there are more Zeros then expected?

A

Investigate whether there is a peak at zero. If so? Two options:

1) Zero inflated models
2) Hurdle models or zero-altered models.