Logistic Regression Flashcards

1
Q

Why we can’t use linear regression techniques to analyse dichotomous outcome (ie Y=0 or 1) as a function of a set of linear predictors X=(Xj) (3 reasons)

A
  1. The error terms (é”) are not normally (Gaussian) distributed. They can only take on two values.
  • if Y = 1 then é” = 1- o + ΣßjXj)
  • if Y = 0 then é” = - (ßo + ΣßjXj)
  1. The probability of the outcome occurring (p(Y = 1)) depends on the values of the predictor variables (X). Since the variance of a binomial distribution is a function of the probability (p), the error variance will also vary with the level of X and consequently, the assumption of homoscedasticity will be violated.
  2. The mean responses should be constrained as: 0=<e>Y) = <em>p</em>=&lt;1</e>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the logit transform?

A

In[p/(1-p)] is the logit transform.

This value is the log of the odds of the outcome (because odds=p/(1-p)), so a logistic regression model is sometimes referred to as a log odds model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the logit transformation leads to the logistic model? (math equation)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

FITTING A LOGISTIC REGRESSION MODEL: Maximum likelihood estimation or how to estimate the regression coefficients.

A

The key feature of maximum likelihood estimation is that it estimates values for parameters (the ßs) which are most likely to have produced the data that have been observed. Rather than starting with the observed data and computing parameter estimates (as is done with least squares estimates), determine the likelihood (probability) of the observed data for various combinations of parameter values. The set of parameter values that was most likely to have produced the observed data are the maximum likelihood (ML) estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ASSUMPTIONS IN LOGISTIC REGRESSION (2)

A

Independence: It is assumed that the observations are independent from each other (as in linear regression). If animals are maintained in groups or, if multiple measurements are being made on the same individual, this assumption has probably been violated.

For example, if animals are kept in herds, variation between animals in the study population results from the usual variation between animals plus the variation that is due to differences between herds. This often results in ‘over-dispersion’ or ‘extra-binomial variation’ in the data.
Linearity: As with linear regression, any predictor that is measured on a continuous scale is assumed to have a linear (straight-line) relationship with the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the assumption about the distribution of errors in the logistic model? Why?

A

Because the logistic model models the expected probability of disease in the logit scale but the original data are binary (0/1 , no/yes), the logistic model does not have an error term.

There is no assumption about the distribution of erros in the logistic model.

It also means that coefficients in a logistic model represent the effect of a predictor on the logit of the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the test used to determine the overall significance of a logistic model?

And formula?

A

Likelihood ratio test (LRT).

Compares the likelihood of the ‘full’ model (ie. with all the predictors included) with the likelihood of the ‘null’ model (ie. a model which contains only the intercept). Consequently, it is analogous to the overall F-test of the model in linear regressions.

L is the likelihood of the full model

L0 is the likelihood of the null model

G20 has and approximate Chi sq distribution with k degrees of freedom

k = number of predictors in the full model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditions when computing an LRT statistic (in logistic regression models) (2)

A
  1. _Both models must be fit using exactly the same observation_s. If a dataset contains missing values for some predictors in the full model, then these would be omitted from the full model but included when the null model is computed. This must be avoided.
  2. The models must be nested. The predictors in the simpler model must be a subset of those in the full model. This will not be a problem when the smaller model is the null model, but might be a problem in other situations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A special case of the likelihood ratio test: Comparing full and saturated models (deviance) in logistic regression

A

A special case of the likelihood ratio test is the comparison of the likelihood of the model under investigation to the likelihood of a fully saturated model (one in which there would be one parameter fit for each data point).

Since a fully saturated model should perfectly predict the data, the likelihood of the observed data should be 1 (or InLsat=0). This comparison yields a statistic called the deviance which is analogous to the error sum of squares (SSE) in linear regression. The deviance is a measure of the unexplained variation in the data.

Note: The deviance computed in this manner does not have a X2 distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

INTERPRETATlON OF COEFFICIENTS (ß1) in logistic regression models (3+ intercept)

A

The coefficients in a logistic regression model represent the amount the logit of the probability of the outcome changes with a unit increase in the predictor. This is hard to interpret so the coefficients are converted into odds ratios.

Dichotomous predictor: change in the log odds of disease when the factor is present. Converted into OR by exponentiating the coefficient. If the outcome is relatively rare, the OR good approximation of the risk ratio (RR).

OR = eß2

Continuous predictor: change in the log odds of disease for a one-unit change in the predictor. The OR represents the factor by which the odds of disease are increased (or decreased) for each one-unit change in the predictor.

  • OR* = (x1, x2) = OR(x2-x1)
  • *Categorical predictor**: converted to indicator (dummy) variable. The coefficient for each indicator variable represents the effect of that level compared to the category (ie the ‘baseline’) not included in the model. The coefficients are interpreted in the same manner as for any other dichotomous predictor.

Interpretation of the intercept: depends on how the data were collected. The intercept represents the logit of the probability of disease if all of the ‘risk factors’ are absent (ie equal to zero). This can be expressed as:

ln(po/(1-po)) = ßo [po equals the probability of disease in this ‘non-exposed’ group]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ASSESSING INTERACTION AND CONFOUNDING

A

Confounding is assessed by adding the potential confounding variable to the model and making a subjective decision as to whether or not the coefficient of the variable of interest has changed ‘substantially’.

Interaction is assessed by a_dding the cross-product term (X1 * X2) and determining if the coefficient for the term is statistically significant_. Estimation of ORs in the presence of interaction deserves some attention though. If interaction is present, the OR for the variable of interest has to be determined at a predefined level of the interacting variable because it will vary with the level of the interacting variable.

If the interaction is between two dichotomous predictors, the coefficient for each main effect represents the effect of that variable in observations in which the other variable is absent. The interaction term represents the additional effect of having both factors present, over the sum of the two individual effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Process of building a logistic model (5)

A
  • laying out a tentative causal diagram to guide your thinking
  • unconditional analyses of relationships between predictors and the outcome of interest using a ‘liberal’ P-value
  • evaluation of relationships (correlations) among predictor variables
  • automated model-building processes (used with caution)
    • forward selection
    • backward elimination
    • stepwise selection
    • best subset regression
  • manual model-building guided by a causal diagram (preferred method) including:
    • evaluation of confounding
    • evaluation of interaction.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MODEL-BUILDING: options for evaluating the shape of the relationship between outcome and a dichotomous predictor. (5)

A
  1. Plotting the residuals from the model, with the predictor included, against the values of the predictor.
  2. Categorising the continuous predictor and:
  • a. inserting the indicator variables into the model, or
  • b. computing and plotting the log odds of the outcome against the category means.
  1. Adding higher order terms to the model:
  • a. quadratic and possibly cubic terms, or
  • b. orthogonal polynomials, or
  • c. fractional polynomials.
  1. Generating a smoothed scatterplot of the log odds of the outcome against the predictor.
  2. Creating several linear splines to use instead of the original variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Particular features for fitting a logisitc model

A
  1. Data may be binary (0/1) -also called Bernoulli data-, with one observation per study unit, or binomial -also called grouped data- with each observation containing the number of positive responses and the number of trials for study units with a certain set of characteristics. A covariate pattern is a unique combination of values of predictor variables.

If data contains continuous variables, there might be as many covariate patterns a there are data points (ie. each covariate pattern will have only one observation in it), and these data are referred to as binary data. this distinction becomes crucial when computing residuals and evaluating the fit of logistic regression models.

  1. Process of evaluating the shape of the relationship between a continuous predictor variable and the outcome of interest. The assumption is that the relationship between the continuous predicotr and the log odds of the outcome (not the outcome itself) is linear.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Components of a general linear model (GLM) (ALL = 4)

A

Link function (cornerstone of GLMs): the idea that linear modelling of predictors should be allowed to take place on a different scale from the scale of the observations. The link function makes that transition between the observation’s mean and the linear modelling.

Distribution of the outcome Y:binomial (including binary), Poisson, negative binomial, Gaussian (normal), inverse Gaussian, and gamma.

N.B. each distribution has a ‘natural’ link function associated: canonical link.

Set of explanatory variables (in a design matrix X), linked to the mean of the ith observation, µi = E(Yi), by the equation:

link (µi ) = ßo + ß1 X1i + … + ßk Xki

Assumption of independence between the outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Selected distributions of outcomes and links used in fitting models in a GLM framework (Canonical links for normal, binomial, poisson, neg. binomial)

A

Distribution of Y Canonical link Selected non-canonical links

Normal Identity Log

Binary/binomial Logit Probit, complementary log-log

Poisson Log Identity

Negative binomial Negative binomial Log, identity

17
Q

Major differences from linear models to a GLM with a non-identity link (2)

A

The most obvious difference is that in a GLM with a non-identity link, all parameters are obtained on a transformed scale, and in order to give meaningful interpretations, we need to

  1. predicted values need to be backtransformed to the original scale, using the inverse link function.
  2. coefficients need to be converted to a more meaningful quantity. Model-specific eg for the logistic model, exponentiating the coefficients produces odds ratios.
18
Q

Estimation methods for GLMs

A

Maximum Likelihood (ML)

If no real likelihood function no longer exists, the estimation is based on a so-called quasi-likelihood function