Logistic Regression Flashcards

1
Q

What is logistic regression?

A

LR is a classification method for binary outcomes. it checks the outcome variable being “True”
P(y=1) = 1/(1+ e-(Bo+B1xi+B2x2+…….+BkXk)

Odds = B0+B1x1+B2x2+……..+Bkxk
e

log(Odds) = B0+B1x1+B2x2+……..+Bkxk
Bigger the Logit is , the bigger P(y=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to calculate baseline model of binary classification model?

A

get the most occurring value as a baseline value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to create logistic regression model in R?

A

glm(poorCare ~., family=binomial, data = Train)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AIC ?

A

Its a measure of quality of model. Like adjusted R square. Only be used with same dataset. Lesser AIC is better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to create predictions in LR?

A

predict(QualityLog, type = “response”) # Response gives the probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Threshold value (t)

A
Often selected based on which errors are "better".
High threshold ( close to 1) : catches all positive cases but may miss some positive cases.
Low threshold(close to 0) : catches all negative cases  but may classify some negative cases as positive cases.
take 0.5 as a threshold if you  don't have any preference.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confusion matrix for selecting threshold

A
predicted = 0            predicted = 1
Actual = 0        True Negatives(TN)      False Positives(FP)
Actual = 1         False Negatives (FN)    True Positives(TP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sensitivity ans Specificity

A
Sensitivity = TP/(TP + FN)
Specificity = TN /( TN + FP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to select good threshold value?

A

Use ROC curve. ( in R you can use ROCR package)

table(train$poorcare, predictTrain > 0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Area Under the (ROC) Curve

A

Given a random positive or negative , proportion of time you guess which is correctly. Perfect prediction is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Outcome measures of Logistic regression.

A
Overall accuracy = (TN + TP)/Total
Sensitivity and Specificity
Overall error rate = (FP + FN)/N
False negative error rate = FN /(TP + FN)
False positive error rate = FP/(TN + FP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to interpret coefficients in logistic regression?

A

1) If we have a coefficient c for a variable, then that means the log odds (or Logit) are increased by c for a unit increase in the variable.
2) If we have a coefficient c for a variable, then that means the odds are multiplied by e^c for a unit increase in the variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to interpret coefficients in logistic regression?

A

1) If we have a coefficient c for a variable, then that means the log odds (or Logit) are increased by c for a unit increase in the variable.
2) If we have a coefficient c for a variable, then that means the odds are multiplied by e^c for a unit increase in the variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly