15. Binary Logistic Model and Logistic Regression Flashcards
What is logistic regression?
It is a statistical analysis method to predict the binary outcome. It predicts a dependent variable by analysing the relationship between one or more independent variables.
What is a binary outcome variable?
When a response (y) is binary coded (e.g. yes or no)
Predictors can be continuous or categorical
What is binary logistic regression?
Binomial Logistic Regression is the statistical fitting of an s-curve logistic or logit function to a dataset in order to calculate the probability of the occurrence of a specific event, or Value to Predict, based on the values of a set of independent variables.
Why do we not use linear regression with a binary outcome variable?
Distributions of a residual would be bimodal
Variation of residuals would not be constant
Relation of X and Y is not linear
Probabilities wouldn’t be contained between 0 & 1
How does logistic regression solve the issues that occur in linear regression of a binary outcome?
The goal of logistic regression is to output values between 0 and 1, which can be interpreted as the probabilities of each example belonging to a particular class.
What function in r would be used to demonstrate binary logistic regression?
glm(y ~ x1 + x2, data = data, family = binomial)
What does the logistic regression model predict?
Predicts the probability that y = 1 (as y can either = 0 or 1)
What is the logistic regression model equation?
P(yi) = 1/ 1 + e - (B0 + B1xi1)
E = Exponential
B0 = Intercept
B1 = Capturing effect of x1 on outcome y
Why are odds and log odds important in logistic regression?
Log odds convert the Logistic Regression which is a probability-based model to a Likelihood–based model so it allows the estimation of model coefficients
Log Odds are equivalent to b0 + b1 + …..
What are odds and what is the odds equation?
Odds of event occurring = Ratio of the probability of event occurring to the probability of event not occurring
P(Y=1)/1-P(Y=1) can range from 0 to infinity
What are log odds and its equation?
Log odds are the natural logarithm of the odds
Logits correlate to an odd and a probability (e.g. -2.21 correlates to a 0.1 chance)
Every probability can be easily converted to log odds, by finding the odds ratio and taking the logarithm.
How can non-linear data be converted to make it linear?
- Convert to probability
- Get odds
- Take log odds
How are logistic regression coefficients estimated?
Logistic regression models are estimated using maximum likelihood estimation (MLE)
MLE finds the logistic regression coefficients that maximise the likelihood of the observed data having occurred
Larger log-likelihood values indicate poorer fitting models
How does the method of MLE differ the the method of least squares used in linear regression?
While least squares estimation minimises the SSE to find the coefficients for the line of best, MLE minimises the log-likelihood
What is the test used to evaluate our overall model in logistic regression?
Likelihood ratio test/ Chi-squared difference test
Compare our model to a baseline model with no predictors (null model)
Assess the improvement in fit
What is the baseline used in a likelihood/chi-squared test?
Baseline model = Predicted values for DV are based on most frequent DV value (0 or 1)
- It is the best guess of DV value in absence of informative predictors
- Analogous to using the mean DV value as the baseline in linear regression
What is used to compare to the baseline model to full model in an overall model test of logistic regression?
Deviance as there will be no variance (as either 0 or 1)
Deviance = -2LL (-2* log likelihood)
How do you test overall model of logistic regression through likelihood/chi-squared test?
Compare the difference in deviance between baseline and full model to chi-squared distributed with df = k (no. predictors)
If there is a significant p value = Model improves baseline
How do you interpret logistic model coefficients?
Beta coefficients are the change in log odds of y for every unit increase in x (holding other IVs constant)
When the coefficient is an exponentiated coefficient, how is it interpreted?
The odds of y when x is equal to 0 (as it’s exponentiated)
What is an odds ratio?
Logs odds aren’t always easily interpretable when DV changes so beta coefficients are converted into odds ratio which are obtained by exponentiating beta coefficients
How do you interpret odds ratio?
Represents the change in odds with a unit increase in x
Odds ratio = 1 (no effect)
Odds ratio < 1 (negative effect)
Odds ratio > 1 (positive effect)
How should deviance of residuals appear in glm() output?
Should be symmetric and roughly centered around 0
What is meant by number of fisher scoring in glm() output?
Number of guesses it took to get the distribution (More guesses, the harder to fit)
How do we look at the statistical significance of predictors?
Use Z-test (although this is prone to type II errors so supplemented by model selection procedure)
beta/ SE of beta
Z-test and associated p-value are provided in the summary output for glm()
How do we calculate confidence intervals in logistic regression?
Compute confidence intervals for coefficients and associated odds ratio
If confidence interval include 1 then there is no effect
So if it doesn’t include 1 then it is significant
How do we compare the fit of two nested logistic regression models?
Likelihood ratio test - provides alternative to z-test
AIC or BIC also test on nested models
How do we compare the fit of two non-nested logistic regression models?
AIC or BIC