logistic regression Flashcards
disadvantage of linear model?
predicted probabilities may be below 0 or above 1
what does logic(p) equal to?
ln(p/1-p)=β0+β1*x (β1 is the expected increase in log-odds when X increases by one unit)
intercept in odds?
e^β0
slope
e^β1
can estimate β be interpreted as a change in the probability Y=1 associate with unit change in X?
No. Odds not linear
sensitivity?
TP/P (used if FN more costly than FP), RAISE SENSITIVITY BY CLASSIFYING MORE AS ‘YES’ (less FN but more FP, specificity reduced)
true positive rate?
TP/P (Sensitivity 1 – Type 2 error)
false positive rate?
FP/N (1 – Specificity Type 1 error)
positive prediction rate?
TP/hat P (precision)
negative prediction rate?
TN/ hat N
what doesROC (Receiver Operator Characteristic) curve traces out?
true positive rate and false positive rate as we vary the probability threshold from 0 to 1
AUC is the area under the ROC curve. what does it measure?
it measures overall performance of classifier (max AUC=1) the larger the better the classifier
what is the chance line?
random guess can produce the classifier at a 45 degree angle. no classifier should be worse than this line. AUC=0.5
for cross validation, what is used instead of MSEs
number of misclassified observations
converting factor variable for numeric linear regression (has negative values so ignore)
Default$default_yes = ifelse(Default$default == “Yes”, 1, 0)
lm_fit = lm(default_yes ~ balance, data = Default)
summary(lm_fit)