04_supervised concepts Flashcards
What is the goal for supervised problems?
find function (task)
that relates input data (x)
to output data (y)
with hyperparameters (θ)
such that f(x;θ) = y
What is a hyperparameter?
a model parameter that the model does not learn (given by programmer)
What is the traditional (rule-based) approach to a supervised learning setup?
The person has seen x and seen y, writes the function
gives the function to the model which tests with unseen x and unseen y
What is a machine-learning approach to a supervised learning setup?
the model finds the rules (function f) by itself
What tasks can ML learn?
- (multi-class) Classification
- binary classification
- multi-label classification
- regression
- object detection
- semantic segmentation
- instance segmentation
- synthesis
What is multi-class classification?
mapping input feature to discrete classes of a single label
(for one label multiple classes)
eg:
label: color
classes: red, green, blue
What is binary classification?
mapping input features to a binary label
eg:
label: status
two classes: on, off
What is multi-label classification?
mapping input features to discrete classes of multiple labels (with multiple classes)
eg:
labels: color, sort, quality
with multiple classes per label,
eg red/green/blue for color or good/medium/bad for quality
What is regression?
mapping input features to continuous variable
eg x = time, y = value
What is object detection?
approximately localize features in image data with bounding boxes
eg: boxes in picture for cat and dog
What types of segmentation are there?
- semantic segmentation
- instance segmentation
What types of classification are there?
- multi-class classification
- binary classification
- multi-label classification
What is semantic segmentation?
assign class label to each pixel of an image based on what it is showing
eg what is part of cat? what is part of dog?
What is instance segmentation?
assign class label to each pixel of an image based on what it is showing
AND discriminate different instances of the class
eg class animal, identifies two instances of class animal
What is synthesis?
generate new data points based on a learned distribution
eg StyleGAN2 (creates faces) or Style Transfer
What is iid?
independent and identically distributed data
–> individual samples in both data sets are produced by the same data generation process
What is assumed when running an ML model on previously unseen data?
that unseen (new) data and the already seen (training) data are iid
–> the individual samples in both data sets are produced by the same data generation process
What is the lesson in iid?
for small sample sizes, data sets that are iid may still differ significantly!
if sample is big, they appear more identical
How will the distributions in real data sets look like?
What are the implications on the performance of the model?
will look different despite being iid, because of the limited extent (size)
successful training on one data set does not imply good performance on unseen data!
–> therefore, the model has to generalize well by preventing overfitting
What is a decision boundary?
the decision boundary separates the different classes
as learned by the trained model
What is overfitting?
the model memorizes the structure of the training/seen data
–> as a result, it generalizes badly on the overall data distribution