ML-08 - Anomaly detection Flashcards
ML-08 - Anomaly detection
Define anomaly detection.
The process of identifying extreme points/observations that deviate significantly from normal data.
ML-08 - Anomaly detection
Is anomaly detection supervised or unsupervised?
Typically unsupervised/semi-supervised.
ML-08 - Anomaly detection
What is the algorithm for fitting a Gaussian-based anomaly detection model?
- Fit parameters mu and sigma to training data.
- Find the threshold value/vector epsilon.
(If known data is available, determine epsilon from that)
ML-08 - Anomaly detection
What is “Gaussianization of features”?
A transformation applied to the features to make them look like a normal dist.
ML-08 - Anomaly detection
Why would we use Gaussianization of features?
Because the choice of features has a huge effect on the anomaly detection algorithm.
ML-08 - Anomaly detection
What can you do if the probability that the data point is normal is high for both normal and anomalous data? (3)
(See image)
- Add new features
- Transform/combine existing features
- Choose features with large/small values for anomalies
(See image)
ML-08 - Anomaly detection
Why might you use anomaly detection with/instead of supervised learning?
- SL only good with balanced data.
- Anomalies can be different from each other -> difficult to learn.