Logistic Regression Flashcards

Question 1

Q

Logistic Regression

Answer

A

Previous lectures will examine predicting variance in a continuous, normally distributed dependent variable. i.e. Linear regression.
This is very common, but it is not the only type outcome we are interested in, somethings cannot be measured in that way particularly in the medical world
- Alive vs. dead
- Addicted vs. non-addicted
- Relapse vs. non-relapse
  This use of absolute categories is extremely common in clinical psychology

Question 2

Q

DSM-V criteria for depression

Answer

A

The DSM-V through diagnostic criteria therefore allows the classification of individuals as having a disorder or not having a disorder (i.e. a medical model, disease is present vs. disease is absent)
Contrast this with using continuous outcomes for questionnaires
Beck Depression Inventory (BDI) 21 item questionnaire scored from 0-63, higher scores are indicative of greater depression symptomology
Hospital anxiety and depression scales (HADS) 7 items relating to depression, score form 0-21
The above represent a way of measuring the same thing.
- Two categories
  Continuous scores

Question 3

Q

continuous outcomes advantages

Answer

A

Inferences can be made with fewer data points
- Higher sensitivity
- More variety in analysis options
- Information on variability of a construct within a population
- Give a better understanding of the variable in question.
  Nonsensical distinctions avoided

Question 4

Q

continuous outcomes- disadvantages

Answer

A

Imagine we ran a linear regression in which we predicted scores on the Beck Depression Inventory and found that that increased alcohol use was associated with increased depression scores
It would be tempting to say that alcohol consumption is associated with depression!
In fact, the correct wording is alcohol consumption is associated with increased scores on the BDI.
We do not have any evidence at all that alcohol use is associated with clinically relevant depression symptoms
We could find these results in a sample where not one single person has depression, and everyone is really happy!
The variance predicted therefore would have no clinical relevance

Question 5

Q

categorical outcomes

Answer

A

When we use diagnostic criteria to give formal diagnoses we can talk about interventions and/or predictors as having a clinically relevant impact
E.G. We want to evaluate the impact of CBT on depression
If we measure depression on a continuous scale and find CBT significantly reduces scores on the BDI you can state the CBT reduces symptoms of depression
If we measure depression on as clinical diagnosis and find a significant impact of CBT on diagnoses we can state that CBT significantly reduces depression
I.E. CBT has a clinically relevant impact on depression, this would suggest that it is an effective treatment (with some caveats!)

Question 6

Q

categorical outcomes- criterion reference

Answer

A

Some questionnaires actually have cut offs:
- Hazardous drinking scores on questionnaire (AUDIT). Use cut off designated by the AUDIT, scores above 8 = hazardous drinking.
- Beck depression inventory (BDI) 9+ is depressed
- Ecological validity of the questionnaire? Reliability?
- Useless in certain groups (non clinical samples rarely score above the cut-off on the BDI).
  As effective as a true diagnosis?

Question 7

Q

categorical outcomes- normative reference

Answer

A

Compare to the norm of your sample.
- Done using Median splits.
- Number of units of alcohol drunk per week. Participants above the median = heavy drinkers, below the median = light drinkers.
- Easy to do but arbitrary.
- Totally sample dependent (take a new sample and the median may well be very different)
- Can do tertile splits (top third vs bottom third)
  …..quartile splits and so on

Question 8

Q

logistic regression

Answer

A

Imagine we’re interested in whether a group of people addicted to heroin relapse or not…
We could predict a continuous variable e.g. frequency of drug use but to make absolute conclusions about concepts such as relapse we need to be able to say whether it specifically causes relapse.
This is what logistic regression does, predicts membership of a group
It is called “binary” logistic regression as that refers to a dichotomous outcome e.g. Relapse = 1, non-relapse = 0

Question 9

Q

logistic regression- predicting probabilities

Answer

A

Because we are not predicting an association between an IV(s) and a continuous dependent variable, calculations and stats are different for the logistic regression
Linear regression tests how close the predicted line is to the actual data (for each data point)
For logistic regression calculations produce a log-likelihood- i.e. How likely is a model to predict that someone is in the correct group
Log-Likelihood (LL): participants observed value for the outcome (0/1) and their predicted value (which will range from 0 – certainly will not happen, and 1 certainly will happen), these discrepancies between observed and predicted is summed across all participants. Its counterpart in linear regression would be the sum of squares error (how far each observation is from the prediction).
Logistic regression compares results to a baseline model
- The baseline prediction is done by simply predicting that participants are more likely to fall into the largest of the two groups.
- E.g. We have two groups; Relapse = 292 participants, abstinent = 123 participants, an educated guess would be that a randomly selected participant will relapse
- We can then test whether our model with IVs is better than the baseline model at predicting group membership
2(LLnew - LL old)
FYI it is multiplied by two so it can be tested for significance more accurately.
In linear regression we compare observed to predicted outcome using the R2 stat
We could report the log likelihood but there are variations of the R2 that have been designed for logistic regression
PSEUDO R2 …..WHY?
McFadden’s R2 - - a measure of how tell the model first the data compared to the null model. Compared the likelihood of the full model to the null model. A higher McFadden = a better fit.
Cragg Uhler R2 -Similar, less conservative and less widely used.
People report either, McFadden’s may be best and is more common.
The model fit indices discussed above can be viewed just like the model fit indices for linear regression (there are some reporting differences though)
We also want to know the number of participants your model correctly classifies.
Expressed as a %.

Question 10

Q

logistic regression- consider individual predictors

Answer

A

As with linear regression we need to consider our individual predictors, i.e. the association between each variable and the DV
The Logistic regression produces a range of stats for this.
- The regression coefficient (b) and its SE and p value
- This gives you the direction of an association and the variability in this association
- A positive coefficient means high scores are associated with the group labelled as one, negative coefficient means high scores associated with the group labelled as 0
We have some unique stats in logistic regression
- Wald statistic
- Exp(B), this an Odds Ratio
  Its called Exp(B) because its an exponentialized regression coefficient

Question 11

Q

Exp(B)/Odds Ratio

Answer

A

This is an odds ratio (OR) indicates the change in odds resulting from a one unit change in the IV
OR= Odds after a unit change in the IV
- Original odds
OR of 1 = no change in likelihood of event
OR of .5 = 50% decrease in likelihood of event
OR of 1.5 = 50% increase in likelihood of event
OR of 4.7 = 370% increase
Cannot be negative.
This because 1 = no change, therefore below 1 = decrease in odds
OR range from 0 to infinity
OR less than 1 need to be treated with caution as there is less numerical space for them to operate in!
If you get an odds ratio that is very small and hard to interpret there is a simple solution……..guesses?
Confidence intervals “95% CI” can (and should) be reported after the odds ratio
95% confidence intervals reflects how confident we can be in our Odds ratio, which expresses the range in which our estimate will fall, 95% of samples from this population will fall in this range
Low variation= “tighter” CI = more accurate estimate
If it overlaps with 1 this means there will not be a significant effect as the range of predicted values overlaps with one (one= no change)

Question 12

Q

assumptions

Answer

A

DV is categorical with two levels only (hence binary 0/1)
One of the DV “events” should not be rare
- E.g. 2 people getting a first, 548 not getting a first
- This causes a problem called “separation” where you get “perfect” predictors
IVs continuous (ratio/interval) or categorical.
No multicollinearity- can assess with VIF