Reliable AI Flashcards

1
Q

data shift (concept shift/drift, fracture point)

A
  • when the training and the test distributions are different
  • lose confidence in the ability to predict
  • causes:
    1. sample selection bias
    2. non-stationary (spatial or temporal) shifts in data
  • formula: Ptest(y, x) != Ptrain(y, x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

covariate shift

A
  • change in covariate (independent variable) distributions
  • causes: caused by
    • temporal, spatial changes
    • data sparsity
    • biased feature selection
    • class shift
  • formula: Ptest(y|x) = Ptrain(y|x) and Ptrain(x)≠Ptest(x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

prior probability shift

A
  • changes in the distribution of the class
  • causes:
    1. class imbalance
  • formula: Ptest(x|y) = Ptrain(x|y) and Ptrain(y)≠Ptest(y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

concept shift

A
  • related to the relationship between the input and output distribution
  • e.g. before and after financial crises needs a concept shift
  • formula:
    • Ptrain(y|x)≠Ptest(y|x) and Ptrain(x)=Ptest(x), for X→Y
    • Ptrain(x|y)≠Ptest(x|y) and Ptrain(y)=Ptest(y), for Y→X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

internal covariate shift

A
  • variation shift of the activations
  • batch normalization should hasten learning (also regularizes the input by adding noise)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

statistical similarity

A
  • non-parametric: kl divergence, jensen-shannon divergence, population stability index, wasserstein distance
  • multi-variate: for high dimensional and unstructured datasets
    • isolation forest
    • KD Trees
    • variational autoencoder
    • normalization flow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

statistical distance

A
  • compare histograms of training data over time
    • population stability index:
    • kolmogorov-smirnov statistic:
    • Kullback-Lebler Divergence:
  • problems:
    • not good for high-dimensional
    • not good for sparse features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

novelty detection

A
  • create a model for modeling distribution
    • one-class support vector machine
  • good: good for complex interactions
  • bad: cannot tell you explicitly what has changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

discriminative distance

A
  • less common method
  • train classifier to detect whether point is from the source domain
  • use training error as proxy of distance between distributions (high error means closer)
  • pros:
    • may be only feasible solution for some deep learning methods (e.g. NLP)
    • good for sparse data
    • good for high dimensions
  • problems:
    • can only be done offline
    • more complicated than other methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to handle dataset shift

A
  1. remove features: set a boundary of acceptable shift, remove and retrain if too much shift
    1. remove if it is no important
  2. reweigh importance: upweight training data more similar to test instance
  3. adversarial search: uses adversarial model that is robust to feature deletion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly