Week 6 DSE Flashcards
What do we do before we create a machine learning model?
q
Visualise data
How to use box plots to tell which variables are important?
Look at median
must be far apart
What must we do whenever we have a categoric variable?
convert to numeric
What are the ranges of result from the logistic model?
Probability Between 0 and 1
More often than not we care about the ______ and ______ of the slope ,b1.
sign
relative magnitude
When to use t value and why? For linear and multi
z-values: glm . Cause you CANT use least squares method . You use maximum likelihood for logistic which follows normal distribution
t-value: not glm, cause you use least squares method
What does glm stand for?
generalised LINEAR model
What must you do in r program to specify use of logistic model?
glm(default ~ balance, data = Default, family = binomial)
NEED TO SPECIFY FAMILY= BINOMIAL
How do i represent scaling a certian independent variable in R?
use I (Represents operations)
= glm(default ~ balance + I(income/1000) + student, data = Default, family = binomial)
What is sensitivity?
measures a classifier’s ability to identify positive status
p(tested POSITIVE | total that are actually positive)
how good we are at identifying positive cases out of all that are actually posittive
What is specificity?q
measures a classifier’s ability to identify negative status
p(tested NEGATIVE| total that are actually negative)
how good we are at identifying negative patients correctly
True negative
What are false positive?
fraction of cases that are ACTUALLY NEGATIVE, wrgonly classified as POSITIVE
What is true positive?
Fraction of cases that are ACTUALLY POSITITVE, that are correctly classified as POSITITVE
sensitivity
What happens as decision threshold increases?
FPR decreases. TPR decresaes
What do we use to measure optimal decision threshold?
Draw multiple ROC curve at differnet thresholds
Find the one with the biggest area under the curve (AUC).