Reliable AI Flashcards
1
Q
data shift (concept shift/drift, fracture point)
A
- when the training and the test distributions are different
- lose confidence in the ability to predict
-
causes:
- sample selection bias
- non-stationary (spatial or temporal) shifts in data
- formula: Ptest(y, x) != Ptrain(y, x)
2
Q
covariate shift
A
- change in covariate (independent variable) distributions
-
causes: caused by
- temporal, spatial changes
- data sparsity
- biased feature selection
- class shift
- formula: Ptest(y|x) = Ptrain(y|x) and Ptrain(x)≠Ptest(x)
3
Q
prior probability shift
A
- changes in the distribution of the class
-
causes:
- class imbalance
- formula: Ptest(x|y) = Ptrain(x|y) and Ptrain(y)≠Ptest(y)
4
Q
concept shift
A
- related to the relationship between the input and output distribution
- e.g. before and after financial crises needs a concept shift
-
formula:
- Ptrain(y|x)≠Ptest(y|x) and Ptrain(x)=Ptest(x), for X→Y
- Ptrain(x|y)≠Ptest(x|y) and Ptrain(y)=Ptest(y), for Y→X
5
Q
internal covariate shift
A
- variation shift of the activations
- batch normalization should hasten learning (also regularizes the input by adding noise)
6
Q
statistical similarity
A
- non-parametric: kl divergence, jensen-shannon divergence, population stability index, wasserstein distance
-
multi-variate: for high dimensional and unstructured datasets
- isolation forest
- KD Trees
- variational autoencoder
- normalization flow
7
Q
statistical distance
A
- compare histograms of training data over time
- population stability index:
- kolmogorov-smirnov statistic:
- Kullback-Lebler Divergence:
- problems:
- not good for high-dimensional
- not good for sparse features
8
Q
novelty detection
A
- create a model for modeling distribution
- one-class support vector machine
- good: good for complex interactions
- bad: cannot tell you explicitly what has changed
9
Q
discriminative distance
A
- less common method
- train classifier to detect whether point is from the source domain
- use training error as proxy of distance between distributions (high error means closer)
-
pros:
- may be only feasible solution for some deep learning methods (e.g. NLP)
- good for sparse data
- good for high dimensions
-
problems:
- can only be done offline
- more complicated than other methods
10
Q
how to handle dataset shift
A
-
remove features: set a boundary of acceptable shift, remove and retrain if too much shift
- remove if it is no important
- reweigh importance: upweight training data more similar to test instance
- adversarial search: uses adversarial model that is robust to feature deletion