Lecture 2 - Machine Learning Project Flashcards

1
Q

What is NumPy Vectorisation?

A

Eliminating having to write loops by using NumPy functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an End-to-End Machine Learning Project?

A

1 Understand the problem and check assumptions.
2 Visualise and explore the data (also to support step 1).
3 Prepare the data for a ML algorithm (works in conjunction with step 2).
4 Select a model, train and validate it (can include fine-tuning).
5 Present your solution.
6 Launch, monitor, and keep checking assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

End-to-End Machine Learning Project Diagram

A

REFER TO THE SLIDES - understand problem specification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are common options to select as performance measures?

A

Mean Squared Error and Mean Absolute Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification Example (Using MNIST dataset)

A

REFER TO SLIDES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for accuracy?

A

Number of correct predictions / Total number of predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a confusion matrix?

A

Tells you the outcome of the classification using a 2 by 2 matrix, with a true label and predicted label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is precision and what is its formula?

A

True positives / (True positives + False positives)

Where a:
True positive is the correctly predicted values
False positives is the values predicted, but are not the actual value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is recall and what is its formula?

A

True positives / (True positives + False negatives)
Where a:
True positive is the correctly predicted values
False negative is the values predicted as that value, but its not actually that value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some trade offs between precision and recall

A

In some scenarios, false positives can be costly, so precision is more important.
- Predicting that it is safe to change lanes while driving, when it is not.
In some scenarios, false negatives can be costly, so recall is more important.
- Predicting that a patient does not have cancer when they do.
NOTE: you can use thresholds to manipulate which one you want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the F1 Score/Harmonic Mean

A

A single metric that combines both precision and recall
Formula: F1 = 2/ ((1/precision) + (1/recall))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Receiver Operating Characteristic (ROC) curve

A

Receiver operating characteristic (ROC) curve, which plots the true positive rate (recall) against the false positive rate (FPR) for varying threshold settings.
Made up of:
TPR (also known as sensitivity and recall) = proportion of positive instances that are correctly classified as positives
Formula: TP / (TP+FN)
TNR (also known as specificity) = proportion of negative instances that are correctly classified as negatives
Formula: TN / (FP+TN)
FPR = proportion of negative instances that are incorrectly classified as positives
Formula: FP / (FP+TN) = 1 − specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is multiclass Classification?

A

Multiclass classifiers are for discriminating between multiple classes (N > 2).

NOTE: Some algorithms (such as the Softmax Regression, Random Forest classifiers or naive. Bayes classifiers) are capable of handling multiple classes directly.
Others (such as Support Vector Machine classifiers) are strictly binary classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly