15. Binary Logistic Model and Logistic Regression Flashcards

1
Q

What is logistic regression?

A

It is a statistical analysis method to predict the binary outcome. It predicts a dependent variable by analysing the relationship between one or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a binary outcome variable?

A

When a response (y) is binary coded (e.g. yes or no)

Predictors can be continuous or categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is binary logistic regression?

A

Binomial Logistic Regression is the statistical fitting of an s-curve logistic or logit function to a dataset in order to calculate the probability of the occurrence of a specific event, or Value to Predict, based on the values of a set of independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we not use linear regression with a binary outcome variable?

A

Distributions of a residual would be bimodal

Variation of residuals would not be constant

Relation of X and Y is not linear

Probabilities wouldn’t be contained between 0 & 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does logistic regression solve the issues that occur in linear regression of a binary outcome?

A

The goal of logistic regression is to output values between 0 and 1, which can be interpreted as the probabilities of each example belonging to a particular class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What function in r would be used to demonstrate binary logistic regression?

A

glm(y ~ x1 + x2, data = data, family = binomial)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the logistic regression model predict?

A

Predicts the probability that y = 1 (as y can either = 0 or 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the logistic regression model equation?

A

P(yi) = 1/ 1 + e - (B0 + B1xi1)

E = Exponential
B0 = Intercept
B1 = Capturing effect of x1 on outcome y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are odds and log odds important in logistic regression?

A

Log odds convert the Logistic Regression which is a probability-based model to a Likelihood–based model so it allows the estimation of model coefficients

Log Odds are equivalent to b0 + b1 + …..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are odds and what is the odds equation?

A

Odds of event occurring = Ratio of the probability of event occurring to the probability of event not occurring

P(Y=1)/1-P(Y=1) can range from 0 to infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are log odds and its equation?

A

Log odds are the natural logarithm of the odds

Logits correlate to an odd and a probability (e.g. -2.21 correlates to a 0.1 chance)

Every probability can be easily converted to log odds, by finding the odds ratio and taking the logarithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can non-linear data be converted to make it linear?

A
  1. Convert to probability
  2. Get odds
  3. Take log odds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are logistic regression coefficients estimated?

A

Logistic regression models are estimated using maximum likelihood estimation (MLE)

MLE finds the logistic regression coefficients that maximise the likelihood of the observed data having occurred

Larger log-likelihood values indicate poorer fitting models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does the method of MLE differ the the method of least squares used in linear regression?

A

While least squares estimation minimises the SSE to find the coefficients for the line of best, MLE minimises the log-likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the test used to evaluate our overall model in logistic regression?

A

Likelihood ratio test/ Chi-squared difference test

Compare our model to a baseline model with no predictors (null model)
Assess the improvement in fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the baseline used in a likelihood/chi-squared test?

A

Baseline model = Predicted values for DV are based on most frequent DV value (0 or 1)

  • It is the best guess of DV value in absence of informative predictors
  • Analogous to using the mean DV value as the baseline in linear regression
17
Q

What is used to compare to the baseline model to full model in an overall model test of logistic regression?

A

Deviance as there will be no variance (as either 0 or 1)

Deviance = -2LL (-2* log likelihood)

18
Q

How do you test overall model of logistic regression through likelihood/chi-squared test?

A

Compare the difference in deviance between baseline and full model to chi-squared distributed with df = k (no. predictors)

If there is a significant p value = Model improves baseline

19
Q

How do you interpret logistic model coefficients?

A

Beta coefficients are the change in log odds of y for every unit increase in x (holding other IVs constant)

20
Q

When the coefficient is an exponentiated coefficient, how is it interpreted?

A

The odds of y when x is equal to 0 (as it’s exponentiated)

21
Q

What is an odds ratio?

A

Logs odds aren’t always easily interpretable when DV changes so beta coefficients are converted into odds ratio which are obtained by exponentiating beta coefficients

22
Q

How do you interpret odds ratio?

A

Represents the change in odds with a unit increase in x

Odds ratio = 1 (no effect)
Odds ratio < 1 (negative effect)
Odds ratio > 1 (positive effect)

23
Q

How should deviance of residuals appear in glm() output?

A

Should be symmetric and roughly centered around 0

24
Q

What is meant by number of fisher scoring in glm() output?

A

Number of guesses it took to get the distribution (More guesses, the harder to fit)

25
Q

How do we look at the statistical significance of predictors?

A

Use Z-test (although this is prone to type II errors so supplemented by model selection procedure)

beta/ SE of beta

Z-test and associated p-value are provided in the summary output for glm()

26
Q

How do we calculate confidence intervals in logistic regression?

A

Compute confidence intervals for coefficients and associated odds ratio

If confidence interval include 1 then there is no effect

So if it doesn’t include 1 then it is significant

27
Q

How do we compare the fit of two nested logistic regression models?

A

Likelihood ratio test - provides alternative to z-test
AIC or BIC also test on nested models

28
Q

How do we compare the fit of two non-nested logistic regression models?

A

AIC or BIC