1. Framing ML Problems Flashcards
What are the key factors for translating business use cases?
First identify impact, success criteria, and data available for a use case. Then, match this with a machine learning approach (an algorithm and a metric)
What is the equation for recall?
Recall = True Positive (TP) / (True Positive (TP) + False Negative (FN))
What is the equation for precision?
Precision = True Positives (TP) / (True Positives (TP) + False Positives (FP))
What are the two types of machine learning?
Supervised and unsupervised
The hybrid is called semi-supervised
What are the common ML problem types?
Tabular:
1. Supervised: Regression, Classification
2. Unsupervised: K-means clustering, PCA
Series:
1. Supervised: Forecasting
Image:
1. Supervised: Image classification, Image segmentation, Object detection
Video:
1. Supervised: Video classification, Video object tracking, Video action recognition
Text:
1. Supervised: Sentiment analysis, Entity extraction, Translation
2. Unsupervised: Topic modelling
Mixed:
1. Supervised/Unsupervised: Collaborative filtering / recommendations
What is semi-supervised learning?
Some data are labeled and others are not.
What are precision, recall and F1 use for?
Precision: Lower false positive
Recall: Lower false negative
F1: Lower false positive and false negative together
What is AUC ROC?
Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a performance metric for classification models at various classification thresholds. It measures the ability of a model to distinguish between positive and negative classes for balanced datasets.
1: Perfect separation of positive and negative classes
0.5: Random guess
It is threshold-invariant, scale-invariant, and robust to outliers.
What is AUC PR?
The Area Under the Curve for the Precision-Recall curve (AUC-PR) is a performance measure for binary classification problems in machine learning for imbalanced datasets.
1: Perfect separation of positive and negative classes
What are the metrics for regression?
MAE: Average absolute difference between the actual and predicted values.
RMSE: Penalize very large value
RMSLE: Penalize under predictions
MAPE: Proportional difference between actual and predicted value.
R^2: Square of the correlation coefficient between the labels and predicted values. Higher value indicates better fit.
What do you need to consider when comes to responsible AI practices?
General best practices: Includes different perspectives
Fairness: academic, legal, cultural. Use statistical methods and test ML models for bias
Interpretability: Model explanations quantify the contributions of each input feature towards making a prediction
Privacy: Minimize leakage.
Security: Protection starts from data collection, training and deployment.