Module 3 - Logistic Regression Flashcards
What type of IVs and DVs can we have in logistic regression?
IVs can be categorical or continuous
DVs must be categorical
Can logistic regression imply any causation?
No
How do we calculate the odds ratio?
Odds of event occurring in one group divided by the odds of the event occurring in the other group
What is complete separation. What example does Field (2013) give?
When our predictor variables make a perfect prediction of the outcome eg: cats and burglars. It is problematic because it gives no in-between data in order to make probabilities, which is the whole point of logistic regression
What is overdispersion?
Overdispersion is when observed variance is higher than expected. It is caused by a violation of the assumption of independence and causes standard errors to become too small
What do we use the log-likelihood statistic for?
To determine how well our model fits the data
How do we calculate the deviance statistic?
-2 x log likelihood
What is parsimony? How do we achieve it in logistic regression?
Parsimony is when we prefer simpler explanations of phenomena over complicated ones. We achieve parsimony by including all variables in the model then progressively removing the ones that aren’t relevant/causing an effect.
Which assumption is broken by logistic regression? How do we overcome it?
The assumption of linearity is broken by logistic regression. We overcome it by using the log (or ‘logit’) of the data. Thus, the assumption of linearity in logistic regression assumes that there is a linear relationship between any continuous predictors and the logit of the outcome variable.
The process of cleaning data and checking for assumptions is the same in logistic regression, expect for an additional 5 steps. What are they?
- Evaluate model parsimony
- Check the linearity of the log outcome variable (logit)
- Assess the data using deviance statistics
- Check for complete separation
- Check for overdispersion
What type of regression do we do if we want to check the model for parsimony? Do we check for interactions?
Hierarchical - put variable 1 and 2 in the first block, and add 3 in the second block. Only check for interactions if we have the theory to back it up.
What does the Wald statistic test? How do we calculate it?
The Wald statistic is similar to the t-test. It tests the null hypothesis that b = 0. It is biased when b is large, because this makes this SE become inflated which makes the Wald underestimated.
Wald = b/SEb
In terms of the relationship between a predictor and outcome variable, what happens when the odds ratio is less than 1 vs. when it is greater than 1?
When odds ratio > 1: as predictor increases, odds of outcome variable increases
When odds ratio < 1: as predictor increase, odds of outcome variable decreases
Explain graphically why we can’t just use multiple linear regression.
Multiple linear regression creates a straight line of best fit when displayed graphically. However, categorical data doesn’t suit a line of best fit. Logistic regression provides a sigmoidal curve, which is much better.
What is the basic example of a research question suited to logistic regression?
“Can one level of the outcome variable (coded 0) be discriminated from the other level (coded 1) using a set of predictors?”