Logistic Regression Flashcards
What is logistic regression?
LR is a classification method for binary outcomes. it checks the outcome variable being “True”
P(y=1) = 1/(1+ e-(Bo+B1xi+B2x2+…….+BkXk)
Odds = B0+B1x1+B2x2+……..+Bkxk
e
log(Odds) = B0+B1x1+B2x2+……..+Bkxk
Bigger the Logit is , the bigger P(y=1)
How to calculate baseline model of binary classification model?
get the most occurring value as a baseline value
How to create logistic regression model in R?
glm(poorCare ~., family=binomial, data = Train)
What is AIC ?
Its a measure of quality of model. Like adjusted R square. Only be used with same dataset. Lesser AIC is better.
How to create predictions in LR?
predict(QualityLog, type = “response”) # Response gives the probabilities.
Threshold value (t)
Often selected based on which errors are "better". High threshold ( close to 1) : catches all positive cases but may miss some positive cases. Low threshold(close to 0) : catches all negative cases but may classify some negative cases as positive cases. take 0.5 as a threshold if you don't have any preference.
Confusion matrix for selecting threshold
predicted = 0 predicted = 1 Actual = 0 True Negatives(TN) False Positives(FP) Actual = 1 False Negatives (FN) True Positives(TP)
Sensitivity ans Specificity
Sensitivity = TP/(TP + FN) Specificity = TN /( TN + FP)
How to select good threshold value?
Use ROC curve. ( in R you can use ROCR package)
table(train$poorcare, predictTrain > 0.5)
Area Under the (ROC) Curve
Given a random positive or negative , proportion of time you guess which is correctly. Perfect prediction is 1.
Outcome measures of Logistic regression.
Overall accuracy = (TN + TP)/Total Sensitivity and Specificity Overall error rate = (FP + FN)/N False negative error rate = FN /(TP + FN) False positive error rate = FP/(TN + FP)
How to interpret coefficients in logistic regression?
1) If we have a coefficient c for a variable, then that means the log odds (or Logit) are increased by c for a unit increase in the variable.
2) If we have a coefficient c for a variable, then that means the odds are multiplied by e^c for a unit increase in the variable.
How to interpret coefficients in logistic regression?
1) If we have a coefficient c for a variable, then that means the log odds (or Logit) are increased by c for a unit increase in the variable.
2) If we have a coefficient c for a variable, then that means the odds are multiplied by e^c for a unit increase in the variable.