Machine Learning Flashcards

1
Q

How do we split data for smaller and bigger datasets?

A

For small datasets we use 60-20-20 but for big datasets, we can use 98-1-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we carry out error analysis?

A

Let’s say we have a cat classifier and achieve 90% accuracy (10% error) on dev set, and we expected a better performance, now let’s say we examine some of the misclassified examples and notice that it misclassifies some dogs as cats, so we have a proposal on how to make the algorithm do better on dogs, the question is, should we start a project focusing on the dog problem? We do an error analysis procedure to see if this is worth the effort:
1- Get about 100 mislabeled dev set examples
2- Examine them manually (count to see how many of these mislabeled examples are dogs)
3- Let’s say 5% of mislabeled are dogs, so out of 100 misclassified examples 5 are dogs pictures, so if we fix this problem, we go from 10% to 9.5% error, so is it really worth it fixing the dog mislabeling problem?
4- We can have multiple ideas like pictures of dogs mislabeled as cats or great cats are being mislabeled or we can improve the performance on blurry images.
We can make a table and see which idea has a bigger share of the mislabeled pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What should we do if some labels in the train/dev/test data are incorrect?

A

Training set: Deep Learning algorithms are quite robust to random errors in the training set. But they are less robust to systematic errors. Exp: labels of white dogs are all cat, then it’s systematic
Dev/Test set: adding a column to the error analysis table for incorrectly labeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What things should we keep in mind when correcting the incorrect dev/test set examples?

A

Apply same process to dev and test sets to make sure they continue to come from the same distribution

Consider examining examples your algorithm got right as well as ones it got wrong. But it’s not easy to do so it’s not always done

When we correct the labels in test/dev set it can cause data from these sets and training set to be from slightly different distributions, which is ok but the super important thing is that dev and test sets come from the same distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What evaluation metric combines precision and recall? How is it calculated

A

F1-Score: 2/ (1/precision + 1/recall) which is the Harmonic mean of precision and recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the formulas for precision and recall?

A

Precision: TP/TP+FP
Recall: TP/TP+FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Bayes error rate?

A

Bayes error rate is the lowest possible error rate for any classifier of a random outcome. No matter how many years you work on a problem, you can’t surpass bayes error rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why progress in accuracy is slower after passing human level performance?

A

Progress is fast until human level performance, human level performance is not that far from bayes optimal error.
Before reaching human level performance there are actually tools to use to improve performance that are harder to use after passing human level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

“Human level performance for many tasks is not that far from the bayes optimal error”, True/False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Logistic regression can suffer from complete separation. If there is a feature that would perfectly separate the two classes, the logistic regression model can no longer be trained. why is that?

A

This is because the weight for that feature would not converge, because the optimal weight would be infinite. This is really a bit unfortunate, because such a feature is really useful. But you do not need machine learning if you have a simple rule that separates both classes.
The problem of complete separation can be solved by introducing penalization of the weights or defining a prior probability distribution of weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following models can produce a non-linear decision boundary for a classification task on 2-dimensional data? (multiple correct answers)
1. Logistic Regression
1. K-Nearest Neighbor
1. Support Vector Machine
1. K-means Classifier

A

1,2,3,4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For SVM algorithm, the linear kernel works fine if your dataset if linearly separable; however, if your dataset isn’t linearly separable, a linear kernel isn’t going to cut it, we should use ____ or ____ instead.

A

RBF kernel, poly kernel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

L2 penalizes big differences between y and y^ more than L1. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly