03 Classification Flashcards by charita rallabhandi

What does random_state?

it produces reproducible result

How well did you know this?

Not at all

Perfectly

Does shuffling training data improve model performance

Shuffle training set as some models perform poorly when instances are ordered.

How well did you know this?

Not at all

Perfectly

What is k-fold cross validation

It divides the dataset into k parts and model is trained on k-1 parts and validated on the kth part.
These parts get shuffled on each iteration.

How well did you know this?

Not at all

Perfectly

What is true positive and true negative

True positive- their actual value is 1 and model also predicted as 1.
True negative- their actual value is 0 and model also predicted as 0.

How well did you know this?

Not at all

Perfectly

What is false positive and false negative

False positive- their actual value is 0 but model predicted them as 1.
False negative- their actual value is 1 but model predicted them as 0.

How well did you know this?

Not at all

Perfectly

Confusion matrix format

T.P F.N
F.P. T.N

How well did you know this?

Not at all

Perfectly

Define accuracy

% of correct prediction made by our model
Formula- (T.N+T.P)/(TP+TN+FP+FN)

How well did you know this?

Not at all

Perfectly

When to use accuracy

It is best to use when there is class balance and worst to use when there is class imbalance.

How well did you know this?

Not at all

Perfectly

Define precision

Among all the positive PREDICTIONS how many are actually positive.
Formula- TP/(TP+FP)

How well did you know this?

Not at all

Perfectly

Define recall

AKA Sensitivity
Among all the ACTUAL positive’s how many are correct
Formula- TP/(TP+FN)

How well did you know this?

Not at all

Perfectly

When to use Precision

When our objective is to minimize false positives
e.g. - if we want our model to catch criminals in this case we let go of some criminal but can not catch an innocent person hence we need to reduce false positives.

How well did you know this?

Not at all

Perfectly

When to use recall

when our objective is to reduce false negatives
e.g. - suppose we want our model for intense checking in airport check-in. in this case it is ok to take an innocent person aside as we can check and let him go but we can not let go of a criminal hence we need to reduce false negative

How well did you know this?

Not at all

Perfectly

what relation between Recall & Precision

Inversely Proportional.

How well did you know this?

Not at all

Perfectly

When to use F1-Score

When we cannot trade of between false positives and false negatives
e.g.- we want a model to predict promotion of an employee in this case we dont want to stop promotion of a deserving employee we also dont want to promote someone not good hence we need both false positive and false negative

How well did you know this?

Not at all

Perfectly

Define F1-Score

It is harmonic mean of precision and recall (we choose HM cause in HM even if either of precision or recall goes low the value reduces drastically)

How well did you know this?

Not at all

Perfectly

Function used to get any score.

Study These Flashcards

cross_val_score(sgdclassifier, x_train, y_train, cv = 3, scoring = “accuracy”)

Impact on threshold on Precision and recall

Study These Flashcards

When we increase the threshold the precision increases and the recall decreases.
As the threshold is decreased the recall increases and the precision decreases.

How can we view decision score?

Study These Flashcards

using decision_function()
we can not view the threshold but we can see the scores.

ROC - AUC Curve

Study These Flashcards

Plot between True Positive Rate (TPR) & False Positive Rate (FPR)

Define True Positive Rate (TPR)

Study These Flashcards

Recall AKA Sensitivity

Define False Positive Rate (FPR)

Study These Flashcards

is the number of negative instances which were wrongly identified as positive.
formula: 1-TNR(True Negative Rate i.e. negative instances which were correctly identified as negative AKA Specificity)

which function to use to get probabilities of each instance

Study These Flashcards

predict_proba()

What is multiclass classification

Study These Flashcards

it distinguishes between multiple classes

which model are capable of classifying multilabel classification

Study These Flashcards

Random Forest & Bayes Classifier

Name few strictly binary models

SVM & Linear Classifiers

How to use a binary classification model for multiclassification model

1. One versus all 2. Ove versus one

Define One versus all strategy

in this if we want to classify 0-9 digits then we will build 10 classification models and we will consider score which is highest in each model.

which among one versus all and one versus one strategy is preferred

One versus all is preferred. Scikit-learn also uses this model by default for all binary classification

Explain One versus One strategy

in this we will build one classification model for each pair.

Best model to start with

Stochastic Gradient Descent Especially with large datasets

What are Multi label classification

e.g. - suppose i need a model to recognise me, jb and pj. now in one instance i have a picture of me and jb then i need the model to give a output as 1,1,0 that is a multi label classification model

03 Classification Flashcards

(31 cards)