Ordinal Logistic Regression Flashcards

1
Q

What is ordinal logistic regression (OLR)?

A

Summary term for several types of ordinal outcomes
- A regression model used when the dependent variable has three or more ordered categories
- Estimates the relationship between predictors and an ordinal outcome using log odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of ordinal outcomes

A
  • Life satisfaction (low, medium, high)
  • Self-confidence levels (not at all, no more than usual, rather more than usual, much more than usual)
    E.g., changing from a 2-3 on the life satisfaction scale is not necessarily the same as changing from a 4-5 on the same scale, even though numerically the difference is the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Difference between nominal and ordinal variables

A
  • Nominal: unordered categories (e.g., diet type: healthy, high-fat, high-sugar)
  • Ordinal: ordered categories (e.g., subjective health: poor, fair, good, very good)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Proportional Odds Model (POM)

A

Assumes that ORs are the same across all cut-off points of the ordinal outcome i.e., the observed ORs are estimates of the same “true” OR.
Expressed as:
logit [ P ( Y > j ) = cj + β1X1 + β2X2 + … + βkXk
- Contains the coefficients estimated being the slopes (β1, β2, … βk)
- Contains the OR for a unit increase in x1 being OR = exp(β1) - this is true if x1 is continuous. For a binary/dummy outcome variable, the OR compares a specific group to the reference group.
Also known as the parallel regression assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stata commands for OLR

A

gologit2 <outcome> <predictor(s)>, pl or - proportional odds model
gologit2 <outcome> <predictor(s)>, or - non-proportional odds model
ologit <outcome> <predictor(s)> - for Brant's test</outcome></outcome></outcome>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-Proportional Odds Model

A
  • Does not assume equal ORs across different dichtomisations
  • Stata models each dichotimisation separately. This gets the same results as if we performed multiple BLRs (one for each possible dichtomisation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Brant’s test for proportional odds assumption:

A

H0: The proportional odds assumption holds
If p > 0.05, we assume proportional odds
Run in Stata using:
ologit <outcome> <predictor(s)> ///
brant, detail</outcome>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How would you transform a continuous predictor for non-linearity?

A

Centring at the mean: gen <varname> = <predictor> - <mean>
Quadratic term may be added for a potential U-shaped relationship: gen <varname> = centred_predictor^2</varname></mean></predictor></varname>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

LRT for model comparison:

A

Compares a simpler model (e.g., linear age effect) with a more complex model (e.g., quadratic age effect)
If p > 0.05, the complex model fits better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Partial proportional odds model

A

Relaxes proportional odds assumption for specific variables while keeping it for others
In Stata:
gologit 2 <outcome> <predictor(s)>, or pl(<predictors></predictors></outcome>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model selection consideration

A
  • Use Brant’s test for proportional odds assumption
  • Use LRT to compare nested models
  • Consider adding interaction terms or nonlinear transformations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is OLR related to BLR?

A
  • A logit transformation (log odds) is used (on the left-hand side of the equation)
  • The measure of effect size is the OR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What possible ways can depression be dichotomised if categorised as ‘none’, ‘moderate’ or ‘severe’?

A
  • Cut-off 1: None / Moderate or severe
  • Cut off 2: None or moderate / severe
    2 dichotomisations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many ways can depression be dichotomised if there are four categories: ‘none’, ‘mild’, ‘moderate’, ‘severe’?

A

Three ways:
- Mild/moderate/severe vs none
- Moderate/severe vs mild/none
- Severe vs none/ mild/moderate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In general, what is the number of possible dichotomisations equal to?

A

The number of categories minus one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does OLR do with all the dichotomisations of an outcome?

A

OLR dichotomises the ordinal outcome in all possible ways, and models the log odds of being in a higher outcome category
I.e., in the context of depression, it compares ‘none’ to ‘moderate’/’severe’ (higher categories) or ‘none’/’moderate’ to ‘severe’ (higher categories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the proportional odds assumption?

A

If we can assume the ORs are the same in all possible cut-offs, we only need to estimate one (common) OR for all cut-offs
E.g., with depression, if the ORs are the same for cut-off 1 (‘moderate’ or ‘severe’ vs. ‘none’) and cut-off 2 (‘severe’ vs. ‘none’ or ‘moderate’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If the ORs are different across dichotimisations, does this necessarily mean they are not proportional?

A

No, the ORs may be different but can be estimating the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When using a non-proportional odds model for modelling the log-odds of the outcome depression (three categories: ‘none’, ‘moderate’, ‘severe’), how do we interpret the output?

A

Output is split into two tables: ‘none’ and ‘moderate’
First table predicts the odds of being in a higher than ‘none’ category (‘moderate’/’severe’ depression vs. ‘none’)
Second table predicts higher than ‘moderate’ depression - ‘severe’ depression vs. ‘none’/’moderate’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does Stata display in the output in a proportional odds model?

A

At the top, the constraint (“let the two ORs be the same”)
The OR estimates would be the same - due to the decision to constrain them to be equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Unlike BLR, what does OLR estimate?

A

The odds of being in a higher category than ‘none’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How would we report ORs if we thought the non-proportional odds model was true?

A

Separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What’s the equation for a proportional odds model?

A

Consider an ordinal outcome, y, with j categories, labelled j = 1, 2, …, J
Let Pj = P(y > j) be the probability of being in a category higher than j
The proportional odds model is:
logit (pj) = cj + β1X1 + β2X2 + … + βkXk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How does the proportional odds equation differ to that of the BLR?

A

We now have ‘pj’ in the logit transformation.
There is one coefficient associated with each predictor (like in logistic regression). However, in logistic regression, we have only one intercept term (β0). In the proportional odds model, we have several intercepts, cj, which correspond to all possible cut-offs. Other coefficients would be the same under the proportional odds assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In a proportional odds model, would be the separate equation if J = 3?

A

logit(p1) = c1 + β1x1 + β2x2 + … + βkxk
logit(p2) = c2 + β1x1 + β2x2 + … + βkxk
The only differences between the right-hand sides of the equation is the intercept (c1 and c2). All slope coefficients β1, β2, etc. are the same in both equations
- The coefficients estimated are the slopes β1, β2, … βk and the cut-offs c1, c2, …, cj-1
- As in BLR, the OR for a unit increase in x1 is ORx1 = exp(β1). Like in BLR, you can get the OR by exponentiating Beta coefficients

25
Q

What would be the equation for proportional odds model estimating the log-odds of depression with three categories: ‘none’, ‘moderate’, ‘severe’?

A

logit(P(Depression > None)) = c1 + β1_Female
logit(P(Depression > Moderate)) = c2 + β1_Female

26
Q

How do you obtain coefficient estimates in OLR in a proportional odds model?

A

gologit2 <outcome> <predictor(s)>, pl
Log odds would be the same in all part of the table
Intercepts for each dichotomisation vary
'pl' stands for parallel lines</outcome>

27
Q

Calculating predicted probabilities in OLR:

A

Just like in BLR, we can use the estimated coefficients to calculate predicted probabilities for sample members with any combination of covariate values
In OLR (proportional and non-proportional odds models) we can calculate the probability of being in any one of the outcome categories

28
Q

What is the difference in the equations for non-proportional odds model compared to proportional odds model e.g., if J = 3?

A

Two different dichotomisations, with different slope coefficients, probabilities and intercepts in each equation
logit(p1) = c1 + β11x1 + β21x2 + … + βk1xk
logit(p2) = c2 + β12x1 + β22x2 + … + βk2x2

29
Q

What do we omit in the Stata command to get a non-proportional odds model?

A

‘pl’ option

30
Q

What do we do with the outcome variable before computing any OLR model?

31
Q

What would the equation for an OLR predicting life satisfaction without any predictors look like?

A

Null model:
logit[P(Lifesat > j)] = cj
j = {low, medium}
We just have the intercept cj for dichotomisation j

32
Q

What is the utility of a null model?

A

We’re not usually interested in the null model for its own sake, but serves as a comparison point for other models

33
Q

What is given in the output for a non-proportional odds model (null model)? E.g., considering life satisfaction with J = 3 (‘low’, ‘medium’, ‘high’)

A

Table 1: Baseline log odds of being in a higher than ‘low’ category
Table 2: Baseline log odds of being in a higher than ‘medium’ category

34
Q

In the null model, what are the estimated intercepts (cut-offs) equal to?

A

The log odds of the observed proportions in the dataset. E.g., with life satisfaction (J = 3, taking the intercept for ‘higher than low’:
P(Lifesat > “Low”) = exp(β0 for low) / 1 + exp(β0 for low)
The result of the reverse logit transformation is the proportion of the sample reporting higher than low life satisfaction

35
Q

How should you initially judge linearity between continuous predictors and an ordinal outcome?

A

Graphically illustrate the relationship between the continuous predictor and ordinal outcome. This can be done by, for example, plotting predicted probabilities, with curvatures indicating non-linearity.
The continuous predictor e.g., age with a mean of 50 should also be centred (normally around the mean) so the intercepts can be interpreted as odds for those aged 50

36
Q

In a proportional odds model, how many interpretations are there per independent variable?

A

One interpretation per independent variable
If there are other covariates in the model, we would also be controlling for other independent variables

37
Q

Why is it worth checking the sample sizes before doing an LRT?

A

To ensure missing data do not affect number of observations

38
Q

What is log likelihood?

A

Probability associated with LRT comparing model computed with the null model

39
Q

What are the hypotheses for a LRT in OLR?

A

H0: None of the independent variables in the current model predicts the DV
H1: At least one of the independent variables predicts the DV
Under H0, the LRT follows a chi-squared distribution with df equal to the number of independent variables in the model

40
Q

What is the LRT statistic?

A

LRT = -2 x (LLnull - LLcurrent model)

41
Q

LRT in OLRs:

A

LRTs in OLR are analogous to BLR
By default, Stata displays the LRT comparing the log likelihood (LL) of the estimated model with the LL of the null model

42
Q

How is the LRT statistic calculated?

A

From the log-likelihoods of the null model and the current model

43
Q

Assumptions of OLR:

A
  • The DV is ordinal
  • For numeric/continuous predictors, the relationship between the IV and the log odds of the outcome is linear
  • Proportional odds: The coefficient for every IV is assumed to be the same for any dichotomisation of the DV. This is sometimes called the “parallel regression” assumption. The proportional odds assumption can be relaxed in a non-proportional odds model
44
Q

How does Brant’s test work in testing the proportional odds assumption?

A
  • Dichotomising the DV in all possible ways
  • Fitting a BLR on each dichotomisation
  • Comparing the estimated coefficients from each of these BLRs
    If the ORs are different in each dichotomisation, there may be evidence that the odds are not proportional.
45
Q

What would be included in the output of a Brant’s test if the outcome was life satisfaction? (J = 3: ‘low’, ‘moderate’, ‘high’)?

A

BLR for “Y > 0” (lifesat > ‘low’)
BLR for “Y > 1” (lifesat > ‘moderate’)
BLR coefficients for each dichotomisation
This is the test for each individual IV

46
Q

What is the omnibus test in a Brant’s test?

A

In the first row of the output, Stata displays an omnibus test. This test the H0 that the odds are proportional for all IVs.
A good strategy is to first look at the omnibus test (“All”). If the result is not significant, we may assume proportional odds. If the result is statistically significant, we look at the individual tests to find out which variable may be problematic

47
Q

What would lead us to conclude no strong evidence against proportional odds assumption in the Brant’s test?

A
  • The coefficients are reasonably similar in the BLRs (indicating that the population ORs may be equal)
  • The Brant test statistics all have large p-values
48
Q

What are some things to be aware of in the Brant’s test?

A
  • With very small samples, the test may lack power and fail to detect important departures from the proportional odds assumption
  • With very large samples, the test may be overly sensitive and detect unimportant departures from the proportional odds assumption
    Therefore, you should always inspect the estimated coefficients in the top part of the output as well as looking at the Brant test p-values themselves. Use your judgement in deciding whether the assumption is reasonable
49
Q

What is a way to account for non-linearity in the model?

A

Adding a quadratic term. This would have its own coefficient in the model (must be included with the centred variable)

50
Q

Why is it important to centre continuous variables before beginning analysis?

A

Make the interpretation and statistical analysis easier by reducing the chance of multicollinearity
E.g., if there was an age variable of 20-70, which was then squared, the age and age-squared variable may be highly correlated. By shifting age down to the mean (50), there will be negative and positive numbers, making it appear more normally distributed and less likely to be highly correlated with the squared term

51
Q

How can we decide whether or not a quadratic term improves the model?

A

We can use the LRT - one model with the quadratic term and one without
H0: Model 2 does not add predictive power compared to Model 1
H1: Model 2 predicts the DV better than Model 1 (including a squared term for the IV, alongside the linear effect, improves the prediction)

52
Q

What may happen to the predicted probabilities if we add a squared term?

A

The relationships are allowed to be curved

52
Q

Why does the non-proportional odds model equation have the additional subscript j?
As in: logit[P(Lifesat > j)] - cj + β1jx1 + β2jx2 + … + βkjxk

A

To indicate that the coefficients are free to differ between equations

53
Q

What are some disadvantages for the non-proportional odds model?

A
  • Inefficient, since proportional odds can safely be assumed for some variables
  • More complicated to interpret than a proportional odds model, since there is a larger number of parameters (coefficients, ORs)
54
Q

What’s the equation for a partial proportional odds model predicting life satisfaction (predictors: sex (female); age centred; age centred squared; and number of friends)

A

logit[P(Lifesat > J)] = cj + β1 x Female + β2 x Agecentred + β3 x Agecentred_continuous + β4j x Friends
- The coefficients for β1, β2, and β3 are the same across equations (proportional odds assumed for female, age, and age2)
- The subscript j in β4j indicates that the slope coefficient of friends is free to vary across equations (proportional odds is not assumed for friends)

55
Q

What is the code for storing model results for an LRT to compare all three types of models?

A

gologit2 <outcome> <predictor(s)>, or pl
est store propodds</outcome>

gologit2 <outcome> <predictor(s)>, or pl(<predictors>)
est store partial</predictors></outcome>

gologit2 <outcome> <predictor(s)>, or
est store noprop</outcome>

lr test partial propodds
lr test noprop partial

56
Q

When comparing all three model types using an LRT, which tests are nested in each other?

A

Can test partial proportional to proportional odds model. Partial proportional odds model is nested within proportional odds model.
Test non-proportional odds model against partial proportional odds model

57
Q

What would be the output of comparing all three model types? E.g., if the p-values were p = 0.038 for the partial proportional odds model and p = 0.503 for the non-proportional odds model

A

Stata will give a table with p-values for proportional odds, partial-proportional odds, and the non proportional odds models, comparing the latter two to the first.
- There is evidence that the partial proportional odds model fits the data better than the full proportional odds model (p = 0.038). The partial proportional odds model is best supported by the data.
- There is no strong evidence that the non-proportional odds model improves the fit compared to the partial proportional odds model (p = 0.503).

58
Q

What needs to be balanced when choosing between models?

A
  • Model fit: the model should predict the outcome reasonably well; measured by the log likelihood.
  • Parsimony: a smaller, simpler model is preferred to a larger, more complicated model; measured by the number of parameters (fewer parameters = simple model)
    We can use LRTs to compare models. In general, it’s advised to choose the simplest plausible model that fits the data reasonably well
59
Q

What factors determine simplicity in OLR?

A
  • Assuming proportional odds leads to a simpler model than non-proportional odds
  • Assuming linearity (in the log odds) is simpler than using non-linear terms (e.g., age2)
    But we should allow for non-proportional odds (e.g., via a partial proportional odds model) and/or non-linearity if we think that this improves the model fit