Binary outcome Flashcards
what is the purpose of logistic regression
- to classify samples
- obese vs not obese
- true vs false
What’s the difference between a simple vs complicated model in logistic regression?
- Simple model: can predict binary outcome using single PV (weight predicts outcome obese vs not obese)
- Complicated: use more than 1 (weight + genotype + age predicts outcome obese vs not obese)
for logistic regression does the PV also need to be binary? What about linear regression?
- No can use continuous and discrete data to predict binaristic outcome.
- Same goes for linear regression, the only difference really is that the outcome is continuous not binary
- WHAT DETRMINS WHICH IS USED DEPENDS ONLY ON THE OUTCOME VARIABLE
how do we know if each variable is usefully contributing to the model?
- if the variable’s prediction is significantly different from 0 then its useful to the model
- Use Walds test
In linear regression - we have the concept of the residual, why does logistic regression not have this?
- All the values don’t deviate from the line too much, see below
what does logistic regression use to calculate the fit of a model
- Maximum likelihood (curve)
how do we find the line with the maximum likelihood
- First pick a probability (curve) that estimimates the outcome for different values of weight.
- Then use this curve to predict the likelihood of observing an obese vs non obese mouse for each value
- Then multiply all those likelihoods together = the likelihood of the data GIVEN this curve
- Do this for lots of different lines, each gives you the total likelihood
- The curve with the maximum likelihood is selected
Why is it innaprioriate to use linear regression when you have a binary outcome?
becasue the model will predict not only 0 and 1 outcomes but values between 0 and 1 e.g., 0.6.
This will produce large residuals which is bad becasue the residuals are whats used to do the fitting. large residuals will bias the result
what is the equation of the logistic curve (S; sigomidal)
1/1+e(c + bx)
odds?
can use the prediction in the logistic regression equation to compute the odds
Probability of event happening divided by the probability of the event not
Happening
This is the same thing as euleurs number raised to the power of the systematic component
(e(c+bX))
Log odds or logit
simply the natural logs of the equation
The log odds transofrm the equation into a linear one. Got rid of Euleurs number.
Log odds vary between negative infinity and infinity as the probability moves from 0 to 1. Log odds are linearly related to the independent variable.
Imagine we have the logit but want the odds. How do we calculate the odds?
e(logit)
imagine we have the odds and want the probability outcome of there being a case or not. How do we calculate this?
Prediction = odds/1+odds
what does the logit tell us
the linear impact of a PV on the DV
with a score 55% tp 56% int he PV, there is an increase in the logit by x
Same amoubt of increase if we were looking at the difference between a score of 64% and 65%
if we are looking at odds of a value of 55%- 56% is the amount it changes equal to a difference between 64% and 65%?
No, bc there isnt a linear relationship
if we are looking at the probability of a value of 55%- 56% is the amount it changes equal to a difference between 64% and 65%?
no, bc there isnt a linear relationship
Odds ratio
calculated by dividing odds from point B with point A.
Odds at 55% attendance/ odds at 54% attendance. Odds for successive odds remains constant.
Gives indication of treatment effect. Tells us the relative increase in odds as you increase the IV by 1 unit
Example: 13 minutes adherence = odds of 0.2551
Odds ratio = 1.2190
Therefore 14 minutes of adherence = 0.2551 * 1.2190 = 0.3110
Then you can apply this to different contexts. Imagine you fit a logistic regression to a sample and get an odds ratio of 0.1 of people getting or not getting a disease. Can conclude for every minute n adhered to treatment they decreased their risk of getting disease by 10%.
LEC: risk
number of n with event/total population
Relative risk
risk in group of interest (n with event/ total n in group A)
/
risk in reference group (n with event/total n in group B)
Risk difference
risk in group of interest - risk in reference
odds
number of n who have event / number of n who don’t have an event
Odds ratio
odds in group of interst / odds in reference group
(n with event/n without event; TREATMENT GROUP) / (n with event/n without event; reference group)
interpret RR or odds ratio of:
1
1>
1<
- 1 no association between exposure and control
- 1> risk/odds of outcome is greater in the exposed group
- 1< risk/odds of outcome smaller in the exposed group
what is the relationship between the RR and OR if the event is rare vs frequent
If the outcome is rare, these values will be more similar. If the outcome is frequent, they are not similar
with binary outcomes, what tests can we use to test for differences in the event/non events between the intervetnion and control
Chi-squared test
if we have a small numbers what correction do we apply to the chi-squared test?
Yate’s correction for continnuity
how does fisher’s exact test work?
Creates a contingency table for all the possible values in cells that the row and column totals could be the same
Then determines the probability of observing each table if the null was true (chance occurences)
Then the sum of the probabilities of the tables that are equal to or more extreme than the observed table = the p value.
with binary outcomes, what do we use to test for differences in the event/non events between the intervetnion and control if numbers are small (less than 5 events in any cell)
Fisher’s exact test
what is typically used to test the difference in dverse events between the intervention and control group
fisher’s exact test
In stata, what are we looking for in chi squared test to tell us about the difference in each group getting a case or not
- Risk difference
- Relative risk/risk ratio
- Odds ratio
- Chi squared result
- P value
what are the assumptions that need to be met prior conducting logistic regression
- Don’t assume variables in model are normally distributed
- Outcomes are independent – whether or not person 1 is a case has no effect on whether person 2 is
What things represents the treatment effect in logistic regression?
- Odds ratio
- Log odds ratio
Interpret the odds ratio of 4.03
The odds of having an event is larger in the treatment group by a magnitude of 4.03
why might we get a odds ratio when using chi-squared vs logistic regression?
because chi-squared does not adjust for baseline covariates while logistic regression does
if the odds ratio for chi-squared test and logistic regression is the same what does that mean
the variables adjusted for had no effect on the outcome
what is the coefficient of the model in logistic regression?
The log(odds ratio)
the model coefficient represents the change in the log-odds of the outcome variable associated with a one-unit change in the predictor variable, holding all other predictor variables constant.
The log-odds is the natural logarithm of the odds, which is the probability of an event occurring divided by the probability of the event not occurring.
The log-odds can take on any value from negative infinity to positive infinity, with positive values indicating higher odds of the event occurring and negative values indicating lower odds.
So, when the coefficient of the model is the log-odds ratio, it tells us how the odds of the outcome variable change with a one-unit increase in the predictor variable. A positive coefficient means that the odds of the outcome variable increase as the predictor variable increases, while a negative coefficient means that the odds of the outcome variable decrease as the predictor variable increases.
How do we get the odds ratio and the CI in logistic regression?
take the exponent of the model coefficient and the exponent of the limits of the 95% CI of model coefficient to get the odds ratio and its CI.
what is the coefficient in log-binomial regression?
log(risk ratio)
how do we get the relative risk and its CI in log-binomial regression?
take the exponent of the coefficient and the exponent of the limits of the 95% CI
Interpret the odds ratio of 4.03
The odds of having an event is larger in the treatment group by a magnitude of 4.03
interpret: chi squared test
Relative risk/Risk ratio [95% CI]: 1.40 [1.20 to 1.64]
interpret this
The risk of losing ≥5% of
initial weight by 12 months is 40% higher in the intervention (support) group than in the
advice group.
interpret: Chi squared test,
Odds ratio [95% CI]: 1.59 [1.29 to 1.96]
The odds of losing at least 5% of initial weight by 12 months is higher in the intervention (support) group than in the advice group by a factor of 1.6.
interpret: logistic regression
Odds Ratio [95% CI]: 1.60 [1.29 to 1.97]; P value <0.001
A significant adjusted odds
ratio in favour of the support arm was found, indicating that participants had an
increased odds of 1.596 (or 1.6) of losing at least 5% of initial weight in the Support
group compared to the Advice group
interpret: log-binomial regression
Risk Ratio [95% CI]: 1.41 [1.21 to 1.64]; P value <0.001
A significant adjusted risk
ratio in favour of the support arm was found, indicating that participants had a 41%
increase in the risk of losing at least 5% of initial weight in the Support group
compared to the Advice group after adjusting for gender and baseline weight.
binary outcome, what are the unadjusted tests and adjusted tests we use?
unadjusted:
> chi squared
fisher’s exact test
non adjusted
> logistic regression
log-binomial regression
binary outcome, what values are used to determine treatment effect
odds ratio (logistic regression)
relative risk/risk ratio (log-binomial regression)