1. Framing ML Problems Flashcards

1
Q

What are the key factors for translating business use cases?

A

First identify impact, success criteria, and data available for a use case. Then, match this with a machine learning approach (an algorithm and a metric)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the equation for recall?

A

Recall = True Positive (TP) / (True Positive (TP) + False Negative (FN))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the equation for precision?

A

Precision = True Positives (TP) / (True Positives (TP) + False Positives (FP))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of machine learning?

A

Supervised and unsupervised
The hybrid is called semi-supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the common ML problem types?

A

Tabular:
1. Supervised: Regression, Classification
2. Unsupervised: K-means clustering, PCA
Series:
1. Supervised: Forecasting
Image:
1. Supervised: Image classification, Image segmentation, Object detection
Video:
1. Supervised: Video classification, Video object tracking, Video action recognition
Text:
1. Supervised: Sentiment analysis, Entity extraction, Translation
2. Unsupervised: Topic modelling
Mixed:
1. Supervised/Unsupervised: Collaborative filtering / recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is semi-supervised learning?

A

Some data are labeled and others are not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are precision, recall and F1 use for?

A

Precision: Lower false positive
Recall: Lower false negative
F1: Lower false positive and false negative together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is AUC ROC?

A

Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a performance metric for classification models at various classification thresholds. It measures the ability of a model to distinguish between positive and negative classes for balanced datasets.
1: Perfect separation of positive and negative classes
0.5: Random guess
It is threshold-invariant, scale-invariant, and robust to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is AUC PR?

A

The Area Under the Curve for the Precision-Recall curve (AUC-PR) is a performance measure for binary classification problems in machine learning for imbalanced datasets.
1: Perfect separation of positive and negative classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the metrics for regression?

A

MAE: Average absolute difference between the actual and predicted values.
RMSE: Penalize very large value
RMSLE: Penalize under predictions
MAPE: Proportional difference between actual and predicted value.
R^2: Square of the correlation coefficient between the labels and predicted values. Higher value indicates better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you need to consider when comes to responsible AI practices?

A

General best practices: Includes different perspectives
Fairness: academic, legal, cultural. Use statistical methods and test ML models for bias
Interpretability: Model explanations quantify the contributions of each input feature towards making a prediction
Privacy: Minimize leakage.
Security: Protection starts from data collection, training and deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly