Machine Learning Flashcards

Question 1

Q

How do we split data for smaller and bigger datasets?

Answer

A

For small datasets we use 60-20-20 but for big datasets, we can use 98-1-1

Question 2

Q

How can we carry out error analysis?

Answer

A

Let’s say we have a cat classifier and achieve 90% accuracy (10% error) on dev set, and we expected a better performance, now let’s say we examine some of the misclassified examples and notice that it misclassifies some dogs as cats, so we have a proposal on how to make the algorithm do better on dogs, the question is, should we start a project focusing on the dog problem? We do an error analysis procedure to see if this is worth the effort:
1- Get about 100 mislabeled dev set examples
2- Examine them manually (count to see how many of these mislabeled examples are dogs)
3- Let’s say 5% of mislabeled are dogs, so out of 100 misclassified examples 5 are dogs pictures, so if we fix this problem, we go from 10% to 9.5% error, so is it really worth it fixing the dog mislabeling problem?
4- We can have multiple ideas like pictures of dogs mislabeled as cats or great cats are being mislabeled or we can improve the performance on blurry images.
We can make a table and see which idea has a bigger share of the mislabeled pool

Question 3

Q

What should we do if some labels in the train/dev/test data are incorrect?

Answer

A

Training set: Deep Learning algorithms are quite robust to random errors in the training set. But they are less robust to systematic errors. Exp: labels of white dogs are all cat, then it’s systematic
Dev/Test set: adding a column to the error analysis table for incorrectly labeled

Question 4

Q

What things should we keep in mind when correcting the incorrect dev/test set examples?

Answer

A

Apply same process to dev and test sets to make sure they continue to come from the same distribution

Consider examining examples your algorithm got right as well as ones it got wrong. But it’s not easy to do so it’s not always done

When we correct the labels in test/dev set it can cause data from these sets and training set to be from slightly different distributions, which is ok but the super important thing is that dev and test sets come from the same distribution

Question 5

Q

What evaluation metric combines precision and recall? How is it calculated

Answer

A

F1-Score: 2/ (1/precision + 1/recall) which is the Harmonic mean of precision and recall

Question 6

Q

What are the formulas for precision and recall?

Answer

A

Precision: TP/TP+FP
Recall: TP/TP+FN

Question 7

Q

What is Bayes error rate?

Answer

A

Bayes error rate is the lowest possible error rate for any classifier of a random outcome. No matter how many years you work on a problem, you can’t surpass bayes error rate.

Question 8

Q

Why progress in accuracy is slower after passing human level performance?

Answer

A

Progress is fast until human level performance, human level performance is not that far from bayes optimal error.
Before reaching human level performance there are actually tools to use to improve performance that are harder to use after passing human level

Question 9

Q

“Human level performance for many tasks is not that far from the bayes optimal error”, True/False?

Question 10

Q

Logistic regression can suffer from complete separation. If there is a feature that would perfectly separate the two classes, the logistic regression model can no longer be trained. why is that?

Answer

A

This is because the weight for that feature would not converge, because the optimal weight would be infinite. This is really a bit unfortunate, because such a feature is really useful. But you do not need machine learning if you have a simple rule that separates both classes.
The problem of complete separation can be solved by introducing penalization of the weights or defining a prior probability distribution of weights.

Ref

Question 11

Q

Which of the following models can produce a non-linear decision boundary for a classification task on 2-dimensional data? (multiple correct answers)
1. Logistic Regression
1. K-Nearest Neighbor
1. Support Vector Machine
1. K-means Classifier

Answer

A

1,2,3,4

Ref

Question 12

Q

For SVM algorithm, the linear kernel works fine if your dataset if linearly separable; however, if your dataset isn’t linearly separable, a linear kernel isn’t going to cut it, we should use ____ or ____ instead.

Answer

A

RBF kernel, poly kernel

Ref
Ref 2
Ref 3

Question 13

Q

L2 penalizes big differences between y and y^{^} more than L1. True/False

Machine Learning Flashcards

(13 cards)