04_supervised concepts Flashcards
What is the goal for supervised problems?
find function (task)
that relates input data (x)
to output data (y)
with hyperparameters (θ)
such that f(x;θ) = y
What is a hyperparameter?
a model parameter that the model does not learn (given by programmer)
What is the traditional (rule-based) approach to a supervised learning setup?
The person has seen x and seen y, writes the function
gives the function to the model which tests with unseen x and unseen y
What is a machine-learning approach to a supervised learning setup?
the model finds the rules (function f) by itself
What tasks can ML learn?
- (multi-class) Classification
- binary classification
- multi-label classification
- regression
- object detection
- semantic segmentation
- instance segmentation
- synthesis
What is multi-class classification?
mapping input feature to discrete classes of a single label
(for one label multiple classes)
eg:
label: color
classes: red, green, blue
What is binary classification?
mapping input features to a binary label
eg:
label: status
two classes: on, off
What is multi-label classification?
mapping input features to discrete classes of multiple labels (with multiple classes)
eg:
labels: color, sort, quality
with multiple classes per label,
eg red/green/blue for color or good/medium/bad for quality
What is regression?
mapping input features to continuous variable
eg x = time, y = value
What is object detection?
approximately localize features in image data with bounding boxes
eg: boxes in picture for cat and dog
What types of segmentation are there?
- semantic segmentation
- instance segmentation
What types of classification are there?
- multi-class classification
- binary classification
- multi-label classification
What is semantic segmentation?
assign class label to each pixel of an image based on what it is showing
eg what is part of cat? what is part of dog?
What is instance segmentation?
assign class label to each pixel of an image based on what it is showing
AND discriminate different instances of the class
eg class animal, identifies two instances of class animal
What is synthesis?
generate new data points based on a learned distribution
eg StyleGAN2 (creates faces) or Style Transfer
What is iid?
independent and identically distributed data
–> individual samples in both data sets are produced by the same data generation process
What is assumed when running an ML model on previously unseen data?
that unseen (new) data and the already seen (training) data are iid
–> the individual samples in both data sets are produced by the same data generation process
What is the lesson in iid?
for small sample sizes, data sets that are iid may still differ significantly!
if sample is big, they appear more identical
How will the distributions in real data sets look like?
What are the implications on the performance of the model?
will look different despite being iid, because of the limited extent (size)
successful training on one data set does not imply good performance on unseen data!
–> therefore, the model has to generalize well by preventing overfitting
What is a decision boundary?
the decision boundary separates the different classes
as learned by the trained model
What is overfitting?
the model memorizes the structure of the training/seen data
–> as a result, it generalizes badly on the overall data distribution
What can we do against overfitting?
regularization methods
When does a model generalize well?
when the decision boundary leads to equally good performance on both data sets
How do we measure, if the model generalizes well?
We have to define a performance metric
What does a performance metric do?
provide a quantitative assessment of how well our model performs
Base on what does the performance metric provide an assessment on the model performance?
- ML implementation (model type and loss function)
- dataset
- task to be learned
What does the Performance metric need to be?
tailored to the model
and the task
and the dataset
How can we identify overfitting?
By comparing the performance on the seen/training data and some previously unseen/test data
What is usually done to the existing data in supervised learning?
the data is randomly split into three parts:
1) Training data set
2) Validation data set (hyperparameter-tuning)
3) Test data set (evaluate model performance)
What are typical ratios for splitting the dataset?
Train: 0.7
Validate: 0.15
Test: 0.15
What is a stratified split?
If we have more than one class, all the classes are split into train validate test on their own and according to the set ratios
–> preserves the class fractions in split data sets
How can the data be validated if the data set available is not big enough to perform a meaningful split?
“recycle” data by using cross-validation to get a better estimate
How is the k-fold cross-validation carried out?
1) split shuffled data set into three parts
2) train with two parts and test with the third
3) repeat with reassigning test and train
–> in independent runs
4) results in k performance metrics
–> report: avg + std / best-of k
Does the k-fold cross-validation improve the models performance?
No
Because it uses independent runs, which means that they don’t build up on each other
it just gives a more reliable estimate of the model’s performance
What are methods to force the model to generalize? (6)
- limiting model capacity
- introducing uncertainty
- dropout (only NN)
- introducing noise
- early stopping (only NN)
- Bagging / Ensembling
What is the usual general supervised learning pipeline?
1) feature engineering
2) data scaling
3) data splitting
4) define hyperparameters
5) train model on training data for fixed hyperparameters
6) evaluate model on validation data
7) repeat 4 to 6 until performance on validation data maximised
8) evaluate trained model on test data and report the test data performance
–> performance metrics should be similar between test and validation data before showing the model the test data
How do we measure the performance of our model?
Benchmarking
What is benchmarking?
refers to the process of quantitatively assessing your ML model’s performance
What is a metric?
measure for performance, depends on the task and the data set
What are the two most important regression task metrics?
What is the intuition behind the calculations?
- MAE (mean absolute error)
1/N Summe I yi’ -y^iI - RMSE (root mean square error)
Wurzel 1/N Summe (yi’ - y^i)^2
Intuition: by how much deviates your model prediction from the ground-truth on average
What is the difference between MAE and RMSE for small datasets?
RMSE is more sensitive to outliers
it depends on the model and the problem if this is beneficial or not
What are (binary) classification metrics?
- Accuracy
- Precision
- Recall
What is the Accuracy Metric and how is it calculated?
What is the overall fraction of correct predictions?
Accuracy = (TP + TN) / (TN + TP + FP + FN)
we correctly identified 95% of all dogs in the image
What is the Precision Metric and how is it calculated?
What fraction of our positive predictions is truly positive? (quantifies “correctness”)
Precision = TP / (TP + FP)
95% of the dogs we predicted are actual dogs
What is the Recall Metric and how is it calculated?
What fraction of actual positives has been identified? (quantifies “completeness”)
Recall = TP / (TP + FN)
we correctly found 95% of the dogs that are in the image
What is the issue with Accuracy/Precision/Recall as Classification Metrics?
Class imbalance!
eg “Will this asteroid impact Earth”
here, Recall is the most important because we don’t want to miss any hitting asteroids
What is the confusion matrix?
a common way to visualize the performance of a classification model
provides information on systematic confusion learned by the classifier
all elements in one row must sum up to unity
What is a sign for a well-trained classifier regarding the confusion matrix?
the diagonal values should be as high as possible while off-diagonal elements should be as low as possible
Is accuracy a good metric for object detection?
No
would be very high by looks at all individual pixels. no indication if the correct pixels were identified correctly
What is a good metric for object detection and image segmentation?
Intersection over union metric
What is IoU and how is it calculated?
Intersection over union metric
IoU = intersection / union = (AnB)/(AuB)
A is prediction-Shape, B is ground-truth
Where is the IoU undefined?
Where there is no ground truth
How to report metrics?
- individual metric
- best-of-n (eg with cross-validation)
- averaging results (average metric over n model runs + standard deviation)
best choice depends on the specific problem and use case