15. Binary Logistic Model and Logistic Regression Flashcards by Crystal Z

What is logistic regression?

It is a statistical analysis method to predict the binary outcome. It predicts a dependent variable by analysing the relationship between one or more independent variables.

How well did you know this?

Not at all

Perfectly

What is a binary outcome variable?

When a response (y) is binary coded (e.g. yes or no)

Predictors can be continuous or categorical

How well did you know this?

Not at all

Perfectly

What is binary logistic regression?

Binomial Logistic Regression is the statistical fitting of an s-curve logistic or logit function to a dataset in order to calculate the probability of the occurrence of a specific event, or Value to Predict, based on the values of a set of independent variables.

How well did you know this?

Not at all

Perfectly

Why do we not use linear regression with a binary outcome variable?

Distributions of a residual would be bimodal

Variation of residuals would not be constant

Relation of X and Y is not linear

Probabilities wouldn’t be contained between 0 & 1

How well did you know this?

Not at all

Perfectly

How does logistic regression solve the issues that occur in linear regression of a binary outcome?

The goal of logistic regression is to output values between 0 and 1, which can be interpreted as the probabilities of each example belonging to a particular class.

How well did you know this?

Not at all

Perfectly

What function in r would be used to demonstrate binary logistic regression?

glm(y ~ x1 + x2, data = data, family = binomial)

How well did you know this?

Not at all

Perfectly

What does the logistic regression model predict?

Predicts the probability that y = 1 (as y can either = 0 or 1)

How well did you know this?

Not at all

Perfectly

What is the logistic regression model equation?

P(yi) = 1/ 1 + e - (B0 + B1xi1)

E = Exponential
B0 = Intercept
B1 = Capturing effect of x1 on outcome y

How well did you know this?

Not at all

Perfectly

Why are odds and log odds important in logistic regression?

Log odds convert the Logistic Regression which is a probability-based model to a Likelihood–based model so it allows the estimation of model coefficients

Log Odds are equivalent to b0 + b1 + …..

How well did you know this?

Not at all

Perfectly

What are odds and what is the odds equation?

Odds of event occurring = Ratio of the probability of event occurring to the probability of event not occurring

P(Y=1)/1-P(Y=1) can range from 0 to infinity

How well did you know this?

Not at all

Perfectly

What are log odds and its equation?

Log odds are the natural logarithm of the odds

Logits correlate to an odd and a probability (e.g. -2.21 correlates to a 0.1 chance)

Every probability can be easily converted to log odds, by finding the odds ratio and taking the logarithm.

How well did you know this?

Not at all

Perfectly

How can non-linear data be converted to make it linear?

Convert to probability
Get odds
Take log odds

How well did you know this?

Not at all

Perfectly

How are logistic regression coefficients estimated?

Logistic regression models are estimated using maximum likelihood estimation (MLE)

MLE finds the logistic regression coefficients that maximise the likelihood of the observed data having occurred

Larger log-likelihood values indicate poorer fitting models

How well did you know this?

Not at all

Perfectly

How does the method of MLE differ the the method of least squares used in linear regression?

While least squares estimation minimises the SSE to find the coefficients for the line of best, MLE minimises the log-likelihood

How well did you know this?

Not at all

Perfectly

What is the test used to evaluate our overall model in logistic regression?

Likelihood ratio test/ Chi-squared difference test

Compare our model to a baseline model with no predictors (null model)
Assess the improvement in fit

How well did you know this?

Not at all

Perfectly

What is the baseline used in a likelihood/chi-squared test?

Study These Flashcards

Baseline model = Predicted values for DV are based on most frequent DV value (0 or 1)

It is the best guess of DV value in absence of informative predictors
Analogous to using the mean DV value as the baseline in linear regression

What is used to compare to the baseline model to full model in an overall model test of logistic regression?

Study These Flashcards

Deviance as there will be no variance (as either 0 or 1)

Deviance = -2LL (-2* log likelihood)

How do you test overall model of logistic regression through likelihood/chi-squared test?

Study These Flashcards

Compare the difference in deviance between baseline and full model to chi-squared distributed with df = k (no. predictors)

If there is a significant p value = Model improves baseline

How do you interpret logistic model coefficients?

Study These Flashcards

Beta coefficients are the change in log odds of y for every unit increase in x (holding other IVs constant)

When the coefficient is an exponentiated coefficient, how is it interpreted?

Study These Flashcards

The odds of y when x is equal to 0 (as it’s exponentiated)

What is an odds ratio?

Study These Flashcards

Logs odds aren’t always easily interpretable when DV changes so beta coefficients are converted into odds ratio which are obtained by exponentiating beta coefficients

How do you interpret odds ratio?

Study These Flashcards

Represents the change in odds with a unit increase in x

Odds ratio = 1 (no effect)
Odds ratio < 1 (negative effect)
Odds ratio > 1 (positive effect)

How should deviance of residuals appear in glm() output?

Study These Flashcards

Should be symmetric and roughly centered around 0

What is meant by number of fisher scoring in glm() output?

Study These Flashcards

Number of guesses it took to get the distribution (More guesses, the harder to fit)

How do we look at the statistical significance of predictors?

Use Z-test (although this is prone to type II errors so supplemented by model selection procedure) beta/ SE of beta Z-test and associated p-value are provided in the summary output for glm()

How do we calculate confidence intervals in logistic regression?

Compute confidence intervals for coefficients and associated odds ratio If confidence interval include 1 then there is no effect So if it doesn't include 1 then it is significant

How do we compare the fit of two nested logistic regression models?

Likelihood ratio test - provides alternative to z-test AIC or BIC also test on nested models

How do we compare the fit of two non-nested logistic regression models?

AIC or BIC

15. Binary Logistic Model and Logistic Regression Flashcards

(28 cards)