Week 7-Logistic Regression Flashcards
What are 2 methods used to measure the same thing?
- Two categories (either this or that)
- Continuous scores (looking to see if predictors can predict variance in the scores)
What are linear models?
Whether or not we can predict variance in a continuous outcome (this lecture is NOT linear but linear things include ANOVA, T Tests etc., e.g., they’re either this or that with no grey area like dead or alive, depressed or not etc.,)
What are continous outcomes?
-Analysed with linear regression-based methods
-(can also be used in ANOVA, correlation etc.,)
What are categorical outcomes?
-Analysed with logistic regression-based methods
-(can also be used Chi-squared, sign test)
What are the 6 advantages of continuous outcomes?
- Inferences can be made with fewer data points (increased statistical power with 30% discrepancy with power compared to categorical)
- Higher sensitivity (can get a range of scores seeing who scores low and high and in-between NO binary outcomes with either this or that)
- More variety in analysis options
- Information on the variability of a construct within a population
- Give a better understanding of the variable in question.
- Nonsensical distinctions avoided (e.g., From an analysis perspective people who have 4 depressive symptoms compared to 5 (no depression vs bare level of depression) are completely different BUT 5 symptoms compared to 9 are seen as the same despite having major differences in severity)
What is the disadvantage of continuous outcomes?
Associations and variance between things such as alcohol use and depression are seen as clinically significant/relevant when they are in fact not!
This is because there is no proof that this is in relation to clinically relevant depression symptoms, what should be said is that alcohol consumption is associated with increased scores on the BDI
What is an advantage of categorical outcomes?
When we use diagnostic criteria to give formal diagnoses we can talk about interventions and/or predictors as having a clinically relevant impact
Categorical outcomes: What are criterion references?
■ Some questionnaires have cut-offs to group people:
– Hazardous drinking scores on the questionnaire (AUDIT). Use the cut-off designated by the AUDIT, scores above 8 = hazardous drinking.
– Beck depression inventory (BDI) 9+ is depressed.
Problems:
-These questionnaires have not been validated in every single sample of people e.g., low ecological validity of the questionnaire + low reliability e.g., students will typically score high in AUDIT so this sample has not been validated
■ Useless in certain groups (non-clinical samples rarely score above the cut-off on the BDI as it tends to be clinically depressed people who score high).
■ As effective as a true diagnosis?
Categorical outcomes: What are normative references?
■ Compare to the norm of your sample.
■ Done using Median splits.
■ Number of units of alcohol drunk per week. Participants above the median = heavy drinkers, below the median = light drinkers.
Problems:
■ Easy to do but arbitrary (random)
■ Totally sample dependent (take a new sample and the median may well be very different)
■ Can do tertile splits (top third vs bottom third then getting rid of middle to reduce the power more), quartile splits and so on
-Lacks sensitivity
What does categorical data allow us to have?
Allows us to make decisions concerning clinical outcomes, or what we decide is a relevant effect (doesn’t have to be a clinical diagnosis could just be something we decide is a relevant effect)
For example, we may decide that losing 5kg is a clinically significant outcome in a weight loss trial, giving us two groups, successful weight loss (5kg or more) vs. unsuccessful (<5kg)
What do we use logistic regression for?
■ We use logistic regression to explore what variables are associated with an outcome
■ This gives us model fit statistics (similar to a linear regression i.e., how well our model fits our data)
■ Regression coefficients for each predictor (similar to a linear regression i.e., to see if there is an association with the DV)
■ Odds Ratio’s: These explain the % change in the DV attributable to a unit change in an IV (i.e., how likely someone is to be in one group compared to the other group and if we increase the IV, how much this changes)
What does logistic regression do?
It predicts membership of a group (i.e., what group does an individual belong to?)
■ It is called “binary” logistic regression as that refers to a dichotomous outcome e.g. Relapse = 1, non-relapse = 0
Why can’t we just fit a straight line in a regression?
-The line would go out of the scatterplot meaning that people addicted for 12 years for example, would be 0.5 relapse (which doesn’t make sense when people can either relapse or not relapse!)
-Another example is someone addicted for 20 years would be above 100% relapse above the line (which again makes no sense!)
-Values cannot exist in the box! You can’t be half a relapse
-A regression line however would give values for within the box
What does logistic regression do if it is unwise to plot a straight line?
Rather than fitting a line of best fit, it fits an S-shaped curve
-It looks to see whether or not it can correctly categorise/predict people as having relapsed or not relapsed. Then it plots another one and so on to see how much it predicts and if its done correctly (i.e., the maximum likelihood e.g., sees the most people who have relapsed vs haven’t relapsed)
What do linear regressions test?
Tests how close the predicted line is to the actual data (for each data point).