03 Classification Flashcards
What does random_state?
it produces reproducible result
Does shuffling training data improve model performance
Shuffle training set as some models perform poorly when instances are ordered.
What is k-fold cross validation
It divides the dataset into k parts and model is trained on k-1 parts and validated on the kth part.
These parts get shuffled on each iteration.
What is true positive and true negative
True positive- their actual value is 1 and model also predicted as 1.
True negative- their actual value is 0 and model also predicted as 0.
What is false positive and false negative
False positive- their actual value is 0 but model predicted them as 1.
False negative- their actual value is 1 but model predicted them as 0.
Confusion matrix format
T.P F.N
F.P. T.N
Define accuracy
% of correct prediction made by our model
Formula- (T.N+T.P)/(TP+TN+FP+FN)
When to use accuracy
It is best to use when there is class balance and worst to use when there is class imbalance.
Define precision
Among all the positive PREDICTIONS how many are actually positive.
Formula- TP/(TP+FP)
Define recall
AKA Sensitivity
Among all the ACTUAL positive’s how many are correct
Formula- TP/(TP+FN)
When to use Precision
When our objective is to minimize false positives
e.g. - if we want our model to catch criminals in this case we let go of some criminal but can not catch an innocent person hence we need to reduce false positives.
When to use recall
when our objective is to reduce false negatives
e.g. - suppose we want our model for intense checking in airport check-in. in this case it is ok to take an innocent person aside as we can check and let him go but we can not let go of a criminal hence we need to reduce false negative
what relation between Recall & Precision
Inversely Proportional.
When to use F1-Score
When we cannot trade of between false positives and false negatives
e.g.- we want a model to predict promotion of an employee in this case we dont want to stop promotion of a deserving employee we also dont want to promote someone not good hence we need both false positive and false negative
Define F1-Score
It is harmonic mean of precision and recall (we choose HM cause in HM even if either of precision or recall goes low the value reduces drastically)