Binary outcome Flashcards
what is the purpose of logistic regression
- to classify samples
- obese vs not obese
- true vs false
What’s the difference between a simple vs complicated model in logistic regression?
- Simple model: can predict binary outcome using single PV (weight predicts outcome obese vs not obese)
- Complicated: use more than 1 (weight + genotype + age predicts outcome obese vs not obese)
for logistic regression does the PV also need to be binary? What about linear regression?
- No can use continuous and discrete data to predict binaristic outcome.
- Same goes for linear regression, the only difference really is that the outcome is continuous not binary
- WHAT DETRMINS WHICH IS USED DEPENDS ONLY ON THE OUTCOME VARIABLE
how do we know if each variable is usefully contributing to the model?
- if the variable’s prediction is significantly different from 0 then its useful to the model
- Use Walds test
In linear regression - we have the concept of the residual, why does logistic regression not have this?
- All the values don’t deviate from the line too much, see below
what does logistic regression use to calculate the fit of a model
- Maximum likelihood (curve)
how do we find the line with the maximum likelihood
- First pick a probability (curve) that estimimates the outcome for different values of weight.
- Then use this curve to predict the likelihood of observing an obese vs non obese mouse for each value
- Then multiply all those likelihoods together = the likelihood of the data GIVEN this curve
- Do this for lots of different lines, each gives you the total likelihood
- The curve with the maximum likelihood is selected
Why is it innaprioriate to use linear regression when you have a binary outcome?
becasue the model will predict not only 0 and 1 outcomes but values between 0 and 1 e.g., 0.6.
This will produce large residuals which is bad becasue the residuals are whats used to do the fitting. large residuals will bias the result
what is the equation of the logistic curve (S; sigomidal)
1/1+e(c + bx)
odds?
can use the prediction in the logistic regression equation to compute the odds
Probability of event happening divided by the probability of the event not
Happening
This is the same thing as euleurs number raised to the power of the systematic component
(e(c+bX))
Log odds or logit
simply the natural logs of the equation
The log odds transofrm the equation into a linear one. Got rid of Euleurs number.
Log odds vary between negative infinity and infinity as the probability moves from 0 to 1. Log odds are linearly related to the independent variable.
Imagine we have the logit but want the odds. How do we calculate the odds?
e(logit)
imagine we have the odds and want the probability outcome of there being a case or not. How do we calculate this?
Prediction = odds/1+odds
what does the logit tell us
the linear impact of a PV on the DV
with a score 55% tp 56% int he PV, there is an increase in the logit by x
Same amoubt of increase if we were looking at the difference between a score of 64% and 65%
if we are looking at odds of a value of 55%- 56% is the amount it changes equal to a difference between 64% and 65%?
No, bc there isnt a linear relationship
if we are looking at the probability of a value of 55%- 56% is the amount it changes equal to a difference between 64% and 65%?
no, bc there isnt a linear relationship
Odds ratio
calculated by dividing odds from point B with point A.
Odds at 55% attendance/ odds at 54% attendance. Odds for successive odds remains constant.
Gives indication of treatment effect. Tells us the relative increase in odds as you increase the IV by 1 unit
Example: 13 minutes adherence = odds of 0.2551
Odds ratio = 1.2190
Therefore 14 minutes of adherence = 0.2551 * 1.2190 = 0.3110
Then you can apply this to different contexts. Imagine you fit a logistic regression to a sample and get an odds ratio of 0.1 of people getting or not getting a disease. Can conclude for every minute n adhered to treatment they decreased their risk of getting disease by 10%.
LEC: risk
number of n with event/total population