logistic regression Flashcards
DV in logistic regression is what type of variable
binary
baseline model predicts what
most common occurrence not what value we after
categorical variable is from the model which is
small number of possibel outcomes
values of logstic vs linear regression
logsitic = between 0 and 1
linear = infinity- negativr infinity
if linear regression y = 0 logstic will =
0.5
higher thann 0 in linear regression = in logsitic regression
greater than 0.5
in the form of a prediction if linear regression y> 0 logstic =
1
logsitic is non linear how to make it back to linear regression
use odds
although y value is 0 or 1 outcome variable output will be
probabaility between 0 and 1, output is a probaility not 0 or 1 but between them
when we build logistic regression model y will take value of 0 or 1 but outcome variable will be
continuos score/ probability between 1 or 0
output is
probability not or 1 but between them
create model using what set and evaluate on what set
training, testing
selecting a threshold of 0.5 predicts
most likley outcome
ROC helps us and what are its axis
pick threshold, TP rate or sensitivity on y axis, TN or specificity on x axis
ROC captures
all thresholds simultaneously
high thresholds have
high speceficty and low sensitivty
low thresholds have
low speceficty and high sesnsitivty
2 main ways to evaluate the model
AUC and accuracy
what are some issues with accuracy, sensitivity, and specificity
depend on the threshold
AUC is area under curve what is the interpratation
given a random positive and negative proportion of the time you can guess which is correct.
AUC is less effected by what then accuracy
sample balance
maximum of AUC
1 perfect prediction: your false positive rate is zero, sensitivity goes up to 1 without compromise on FP rate
Minimum AUC
0.5 -> just guessing
when deciding which model is best use
accuracy, senssitivy and speceficty. caculate gains and losses from model to the baseline model
in the box plot is prediciting
avergae scores for patient not probability