Regression Models for Nominal, Ordinal and Count Data Flashcards

1
Q

What is an example of binary data and what type of analysis is used to study it?

A

Example: Improved/Not improved

Type of analysis: Logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an example of Categorical (multinomial) and what type of analysis can be used to study it?

A

Example:Diagnosis (schizophrenia, affective, other)

Type of analysis: Multiple logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of Ordered categorical (ordinal) data and what type of analysis can be used to study it?

A

Example: Likert scale (agree strongly, disagree strongly)

Type of analysis: Ordered logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an example of count data and what type of analysis can be used to study it?

A

Example: Number of lesions in the brain

Type of analysis: Poisson or negative binomial regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What regression test is used for multiple unordered categories?

A

multinomial multiple logistic regression

It may be easier to convert data to a simple binary so that standard logistic regression can be used, if this
is clinically valid

As in chi-squared test or standard logistic regression, may need to combine small categories

Important: unless you specify using the base option, Stata will choose the most frequent category for the dependent variable as the reference category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What stata command is used to run a multinomial multiple logistic regression?

A

mlogit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For regression modelling we may need to assume what?

A

That the scale is interval ie 0-1 is the same distance as 1-2, etc

This may be controversial but in practice this is often assumed (check this is not ridiculous though!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the stata command ologit do?

A

Makes the proportional odds assumption (that the odds of category 2 compared to 1 is the same as
for category 1 compared to 0, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If the number of categories is large, we could just use what?

A

linear regression as an approximation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is individual data?

A

One record per case with the dependent variable coded as 0/1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is grouped data?

A

Cases aggregated over subgroups; data are counts in each subgroup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is poisson regression often used for?

A

Modeling count data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a rate ratio of less than 1 imply?

A

There is a decreasing association between the predictor and the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a rate ratio of greater than 1 imply?

A

An increasing association between predictor and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a rate ratio of 0.91 mean?

A

That an increase in the independent variable of 1 point would decrease the
rate ratio by a factor of 0.91

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In standard Poisson distribution which describes the probabilities of independent events, the variance equals what?

A

Mean

Often however the observed variance is greater than the mean.

Example: a child missing school one day is assumed to be independent from missing school another day- but is this reasonable? Perhaps certain children tend to miss more days than others.
The consequence is that the observed variance is higher than expected = overdispersion or extra Poisson variation

(The variance might also be lower = underdispersion but this is rare).

17
Q

If a dependent variable is continuous what test should be used?

A

Linear regression

18
Q

What is a rate ratio?

A

In epidemiology, a rate ratio, sometimes called an incidence density ratio or incidence rate ratio, is a relative difference measure used to compare the incidence rates of events occurring at any given point in time.

19
Q

What test should be carried out if we run a goodness of fit model for our poisson regression model and find that the observed variance is higher than expected = overdispersion or extra Poisson variation?

A

We run a Negative Binomial regression- Here the dependent variable, Y, follows the negative binomial. As a result, the variables can be positive or negative integers.

20
Q

Negative Binomial regression can be used for under dispersion.

True or false

A

FALSE

Can only be used for over dispersion

21
Q

In a negative binomial regression output what is the likelihood ratio test comparing?

A

negative binomial regression to poisson model

22
Q

Linear regression, ANOVA, logistic and Poisson regression can be considered to be what?

A

Special cases of a class of models called generalized linear model (GLMs).

23
Q

What are generalised linear models characterised by?

A

“The distribution of the dependent variable y (the ‘family’) and a link function, g(), that connects the expected value of the dependent variable E(y) with the linear predictor, n derived from the explanatory variables”

24
Q

Why should we use a generalised linear model?

A

More general way of modelling continuous, binary and count outcomes
Logistic, Poisson regression can all be performed using glms (along with regressions based on some other models such as gamma)
Advantages:
* more detailed output;
* easier to compare one model with another because all under one ‘umbrella’, and they use common
measures of fit (deviance)
* More control over specific form of model fitted
Stata command is glm

25
Q

What are the advantages of using a generalised linear model?

A

Advantages:
* more detailed output;
* easier to compare one model with another because all under one ‘umbrella’, and they use common
measures of fit (deviance)
* More control over specific form of model fitted

26
Q

What is the stata command for a generalised linear model?

A

glm

27
Q

The glm stata command is: glm depvar indepvars,family(familyname)link(linkname)

What do the options for the link name include?

A
  • identity – identity function
  • log – natural logarithm
  • logit – logit function
  • probit – probit function (inverse cumulative distribution function of a standard normal)
  • cloglog – complemtary log-log function
  • opower # - odds power of order #k
  • power # - power of order #k ….

For example, a Poisson regression model is a special GLM where the dependent variable is assumed to arise from a Poisson distribution (distribution for counts) and the link function is the log function

28
Q

Adding eform to the glm command is equivalent to asking for what?

A

The or or irr

29
Q
  1. Why might non-independent observations occur?
  2. What might these suggest?
  3. What format does the data usually have to be in and what stata command can be used to ensure this?
A
    • Cluster randomisation
      - Clustered sampling (Not ‘cluster analysis’ this is something different!)
      - Repeated measures
      - Longitudinaldata
      - Random effects model
      - Multilevelmodel
  1. The use of the cluster option with the usual command eg logistic, logit or poisson, or a specific random effects model such as xtlogit or xtpoisson
  2. The data usually has to be in ‘long’ format in Stata: use reshape
30
Q

What are types of non-independent data?

A

Cluster randomised trial
* Clinics are randomised to deliver intervention or not
* Outcome is for patients (binary: “improved” or “not improved”)

Longitudinal data
* Survey of patients at 3 time points
* Dependent variable is binary: agree with statement Y/N

Multilevel study
* Children observed in classrooms in school
* Dependent variable is “days off sick”

31
Q

In stata when computing a nbreg or passion model what does the option irr report?

A

Rate ratios (exponentiated coefficients)