20-anomaly detection Flashcards
What is an outlier / anomoly?
A pattern in the data that does not conform to the normal/standard/expected behaviour
What are applications of anomaly detection?
Fraud detection
Ecosystem disturbance
Medicine and public health
Aviation safety
What are the types of anomaly?
Point/global anomaly
Contextual/conditional anomalies
Collective anomalies
What is a point / global anomaly?
An individual data instance is anomalous with respect to the data
What are the two types of attributes in reference to anomaly detection?
Contextual attributes
Behavioural attributes
What are collective anomalies?
A subset of data points are anomalous
What is the difference between anomaly and noise?
Noise is random error and not interesting. Anomalies are interesting and
What is supervised anomaly detection?
Labels are available for normal data and anomalies. Classifiers distinguish between normal data and anomalies
What is semi-supervised anomaly detection?
Labels are only available for normal data. Model normal objects and report those not matching the model as outliers
What are the challenges with semi-supervised anomaly detection?
Requires labels from normal class. Possibly high false alarm rate
What is unsupervised point anomaly detection?
Proximity based, density based, clustering based, statistical anomaly detection
What is statistical anomaly detection?
Anomalies are objects that are fit poorly by aW statistical model.
What is proximity-based anomaly detection?
An object is an anomaly if the nearest neighbours of the object is far away. Compute distance between every pair of data points
What are the ways to detect anomalies through proximity?
- Data points for which there are fewer than p neighboring points within a distance D
- The top n data points whose distance to the kth nearest neighbor is
greatest - The top n data points whose average distance to the k nearest neighbors is greatest
What are the pros of proximity based anomaly detection?
Easier to determine proximity compared to statistical distribution
Quantitative measure of degree to which an object is an outlier