Safety and Reliability Flashcards
Erroneous Behaviour of a Classifier
Given a trained classifier f : R_n -> R_k (from n features to k values) and a target function h : R_n -> R_k, an erroneous behavior of the classifier f is demonstrated by a legitimate input x which exists in R_n such that
arg max_j f(x) != arg max_j h(x)
Loss function
L(y, f(x)) ; loss between prediction f(x) and ground truth y.
Empirical Loss
Average loss over a set.
Expected Loss
The estimated loss (loss of accuracy) before being tested.
Generalisation Loss
Empirical loss - Expected loss. Too big of this value is a result of overfitting.
Overfitting
A machine learning model is overfitted if it performs well on training data but not on test data samples.
Adversarial Examples
Represent erroneous behaviours which introduce safety implications.
Measurements of adversarial examples
- Magnitude of perturbation -> ||x-x’||
- Probability gap between and after the perturbation -> |f_y(x) - f_y(x’)|
(With f(x) being the regular example and f(x’) being the adjusted example.)
Data Poisoning
The injection of malicious data into a training process, making the algorithm perform something it should not.
Model Stealing
Given model f, a model stealing agent reconstructs another model f’ (etc querying model f).
Membership Inference
Identifies the training data via shadow models for the training model by observing the models behaviour and the outcomes.