week 8 - binary logistic regression models Flashcards
1
Q
what is the binary outcome variable
A
- linear regression can work very well when you have a continuous outcome
2
Q
what are the two possible outcomes
A
- participants can either be outcome a or outcome, either be happy or unhappy
3
Q
how do we overcome the violation
A
- When the outcome is binary (two possible outcomes), the assumption of linearity is ALWAYS violated
- We can apply a transform to the data to express the non-linear relationship in a linear way
- Binary logistic regression does this by expressing the linear regression equation in logarithmic terms
- This overcomes the issue of violating this assumption
4
Q
what is binary logistic regression
A
- Logistic regression is a generalized linear model – flexible generalisation of linear regression
- Predicting an outcome that has only two possible outcomes
- Which of two outcomes is an individual likely to have (e.g. happy/not happy, pass/fail)?
- Predictors can be continuous, categorical, or a combination
5
Q
what is the odds ration
A
- An odds ratio is one of the most important outcomes of logistic regression
- Odds ratio = Change in odds resulting from a unit change in the predictor
- Measure of association between a predictor and an outcome
6
Q
how do we interpret the odds ratio
A
- a unit increase in the predictor is associated with a lower odds of the outcome
- unit increase in the predictor is associated with a higher odds of the outcome
7
Q
what does the odds ratio number mean
A
- Individuals who have a hamster have 4.69x higher odds being happy relative to individuals who do not have a hamster
- You must use the word ‘odds’ when referring to odds ratios
8
Q
what is the independence of errors
A
- Cases of data should not be related
- For instance, each cases should represent data from a different person
- We can’t really test for this - we should just know this is true based on the methodology
9
Q
what is the failure to coverage
A
- When you run a binary logistic regression model, R starts by estimating the parameters with a best guess
- It then attempts to estimate the parameters more accurately
- It stops when on each new attempt, the parameters are very similar (it “converges”)
- Sometimes it doesn’t converge:
Ignore the output – not accurate!!
10
Q
how do we prepare our data
A
- The binary outcome should be stored as a numeric value with outcomes coded as 0 and 1
- Categorical predictor should be a factor
- Run the binary logistic regression model Code to run the binary logistic regression model
11
Q
how do we evaluate the model
A
- To assess the fit of our model, we can compare our specified model to a model containing only the intercept (no predictors)
- We do this by looking at a measure called the “deviance”:
12
Q
does the binary logistic regression have R2
A
- R2 in linear regression = the proportion of variance explained by the model
In logistic regression, this doesn’t exist
13
Q
what is the intercept
A
14
Q
how do we evaluate the individual predictors
A
- To convert back from the log scale, we exponentiate our log odds (“Estimate”).
- This gives us our odds ratio
- We also want a confidence interval arounds the odds ratio
- 95% confidence interval tells us the likely range the true odds ratio in the population is contained in
15
Q
what are predicted probabilities
A
- But we can obtain probabilities from our model too. For instance:
- If an individual has a hamster, what’s the probability they will be happy?
- If an individual does not have a hamster, what’s the probability they will be happy?