Regression Models for Nominal, Ordinal and Count Data Flashcards
What is an example of binary data and what type of analysis is used to study it?
Example: Improved/Not improved
Type of analysis: Logistic regression
What is an example of Categorical (multinomial) and what type of analysis can be used to study it?
Example:Diagnosis (schizophrenia, affective, other)
Type of analysis: Multiple logistic regression
What is an example of Ordered categorical (ordinal) data and what type of analysis can be used to study it?
Example: Likert scale (agree strongly, disagree strongly)
Type of analysis: Ordered logistic regression
What is an example of count data and what type of analysis can be used to study it?
Example: Number of lesions in the brain
Type of analysis: Poisson or negative binomial regression
What regression test is used for multiple unordered categories?
multinomial multiple logistic regression
It may be easier to convert data to a simple binary so that standard logistic regression can be used, if this
is clinically valid
As in chi-squared test or standard logistic regression, may need to combine small categories
Important: unless you specify using the base option, Stata will choose the most frequent category for the dependent variable as the reference category.
What stata command is used to run a multinomial multiple logistic regression?
mlogit
For regression modelling we may need to assume what?
That the scale is interval ie 0-1 is the same distance as 1-2, etc
This may be controversial but in practice this is often assumed (check this is not ridiculous though!)
What does the stata command ologit do?
Makes the proportional odds assumption (that the odds of category 2 compared to 1 is the same as
for category 1 compared to 0, etc)
If the number of categories is large, we could just use what?
linear regression as an approximation
What is individual data?
One record per case with the dependent variable coded as 0/1
What is grouped data?
Cases aggregated over subgroups; data are counts in each subgroup
What is poisson regression often used for?
Modeling count data.
What does a rate ratio of less than 1 imply?
There is a decreasing association between the predictor and the outcome
What does a rate ratio of greater than 1 imply?
An increasing association between predictor and outcome
What does a rate ratio of 0.91 mean?
That an increase in the independent variable of 1 point would decrease the
rate ratio by a factor of 0.91
In standard Poisson distribution which describes the probabilities of independent events, the variance equals what?
Mean
Often however the observed variance is greater than the mean.
Example: a child missing school one day is assumed to be independent from missing school another day- but is this reasonable? Perhaps certain children tend to miss more days than others.
The consequence is that the observed variance is higher than expected = overdispersion or extra Poisson variation
(The variance might also be lower = underdispersion but this is rare).
If a dependent variable is continuous what test should be used?
Linear regression
What is a rate ratio?
In epidemiology, a rate ratio, sometimes called an incidence density ratio or incidence rate ratio, is a relative difference measure used to compare the incidence rates of events occurring at any given point in time.
What test should be carried out if we run a goodness of fit model for our poisson regression model and find that the observed variance is higher than expected = overdispersion or extra Poisson variation?
We run a Negative Binomial regression- Here the dependent variable, Y, follows the negative binomial. As a result, the variables can be positive or negative integers.
Negative Binomial regression can be used for under dispersion.
True or false
FALSE
Can only be used for over dispersion
In a negative binomial regression output what is the likelihood ratio test comparing?
negative binomial regression to poisson model
Linear regression, ANOVA, logistic and Poisson regression can be considered to be what?
Special cases of a class of models called generalized linear model (GLMs).
What are generalised linear models characterised by?
“The distribution of the dependent variable y (the ‘family’) and a link function, g(), that connects the expected value of the dependent variable E(y) with the linear predictor, n derived from the explanatory variables”
Why should we use a generalised linear model?
More general way of modelling continuous, binary and count outcomes
Logistic, Poisson regression can all be performed using glms (along with regressions based on some other models such as gamma)
Advantages:
* more detailed output;
* easier to compare one model with another because all under one ‘umbrella’, and they use common
measures of fit (deviance)
* More control over specific form of model fitted
Stata command is glm
What are the advantages of using a generalised linear model?
Advantages:
* more detailed output;
* easier to compare one model with another because all under one ‘umbrella’, and they use common
measures of fit (deviance)
* More control over specific form of model fitted
What is the stata command for a generalised linear model?
glm
The glm stata command is: glm depvar indepvars,family(familyname)link(linkname)
What do the options for the link name include?
- identity – identity function
- log – natural logarithm
- logit – logit function
- probit – probit function (inverse cumulative distribution function of a standard normal)
- cloglog – complemtary log-log function
- opower # - odds power of order #k
- power # - power of order #k ….
For example, a Poisson regression model is a special GLM where the dependent variable is assumed to arise from a Poisson distribution (distribution for counts) and the link function is the log function
Adding eform to the glm command is equivalent to asking for what?
The or or irr
- Why might non-independent observations occur?
- What might these suggest?
- What format does the data usually have to be in and what stata command can be used to ensure this?
- Cluster randomisation
- Clustered sampling (Not ‘cluster analysis’ this is something different!)
- Repeated measures
- Longitudinaldata
- Random effects model
- Multilevelmodel
- Cluster randomisation
- The use of the cluster option with the usual command eg logistic, logit or poisson, or a specific random effects model such as xtlogit or xtpoisson
- The data usually has to be in ‘long’ format in Stata: use reshape
What are types of non-independent data?
Cluster randomised trial
* Clinics are randomised to deliver intervention or not
* Outcome is for patients (binary: “improved” or “not improved”)
Longitudinal data
* Survey of patients at 3 time points
* Dependent variable is binary: agree with statement Y/N
Multilevel study
* Children observed in classrooms in school
* Dependent variable is “days off sick”
In stata when computing a nbreg or passion model what does the option irr report?
Rate ratios (exponentiated coefficients)