03 Classification Flashcards
What does random_state?
it produces reproducible result
Does shuffling training data improve model performance
Shuffle training set as some models perform poorly when instances are ordered.
What is k-fold cross validation
It divides the dataset into k parts and model is trained on k-1 parts and validated on the kth part.
These parts get shuffled on each iteration.
What is true positive and true negative
True positive- their actual value is 1 and model also predicted as 1.
True negative- their actual value is 0 and model also predicted as 0.
What is false positive and false negative
False positive- their actual value is 0 but model predicted them as 1.
False negative- their actual value is 1 but model predicted them as 0.
Confusion matrix format
T.P F.N
F.P. T.N
Define accuracy
% of correct prediction made by our model
Formula- (T.N+T.P)/(TP+TN+FP+FN)
When to use accuracy
It is best to use when there is class balance and worst to use when there is class imbalance.
Define precision
Among all the positive PREDICTIONS how many are actually positive.
Formula- TP/(TP+FP)
Define recall
AKA Sensitivity
Among all the ACTUAL positive’s how many are correct
Formula- TP/(TP+FN)
When to use Precision
When our objective is to minimize false positives
e.g. - if we want our model to catch criminals in this case we let go of some criminal but can not catch an innocent person hence we need to reduce false positives.
When to use recall
when our objective is to reduce false negatives
e.g. - suppose we want our model for intense checking in airport check-in. in this case it is ok to take an innocent person aside as we can check and let him go but we can not let go of a criminal hence we need to reduce false negative
what relation between Recall & Precision
Inversely Proportional.
When to use F1-Score
When we cannot trade of between false positives and false negatives
e.g.- we want a model to predict promotion of an employee in this case we dont want to stop promotion of a deserving employee we also dont want to promote someone not good hence we need both false positive and false negative
Define F1-Score
It is harmonic mean of precision and recall (we choose HM cause in HM even if either of precision or recall goes low the value reduces drastically)
Function used to get any score.
cross_val_score(sgdclassifier, x_train, y_train, cv = 3, scoring = “accuracy”)
Impact on threshold on Precision and recall
When we increase the threshold the precision increases and the recall decreases.
As the threshold is decreased the recall increases and the precision decreases.
How can we view decision score?
using decision_function()
we can not view the threshold but we can see the scores.
ROC - AUC Curve
Plot between True Positive Rate (TPR) & False Positive Rate (FPR)
Define True Positive Rate (TPR)
Recall AKA Sensitivity
Define False Positive Rate (FPR)
is the number of negative instances which were wrongly identified as positive.
formula: 1-TNR(True Negative Rate i.e. negative instances which were correctly identified as negative AKA Specificity)
which function to use to get probabilities of each instance
predict_proba()
What is multiclass classification
it distinguishes between multiple classes
which model are capable of classifying multilabel classification
Random Forest & Bayes Classifier
Name few strictly binary models
SVM & Linear Classifiers
How to use a binary classification model for multiclassification model
- One versus all
- Ove versus one
Define One versus all strategy
in this if we want to classify 0-9 digits then we will build 10 classification models and we will consider score which is highest in each model.
which among one versus all and one versus one strategy is preferred
One versus all is preferred.
Scikit-learn also uses this model by default for all binary classification
Explain One versus One strategy
in this we will build one classification model for each pair.
Best model to start with
Stochastic Gradient Descent
Especially with large datasets
What are Multi label classification
e.g. - suppose i need a model to recognise me, jb and pj.
now in one instance i have a picture of me and jb then i need the model to give a output as 1,1,0 that is a multi label classification model