week 3 Flashcards
anomalies
point anomaly
contextual (sequential) anomaly - anomalous within a context, use sliding windows, learn model for predicting next value, compute expected next value, evaluate residual using real next value, use decision threshold to decide if it is anomaly
collective anomaly - individual points are not anomalous by themselves, treat set of points as datasets, compare these sets (kind of like a sliding window)
classification
OSVM - maximize negative space, minimize positive space
Isolation Forest - repeat N times - pick feature f, split randomly, continue until all leaves contain singletons, path length to leaf = isolation score, average isolation score over all trees => anomaly score (goal is to isolate anomalies)
NN based
distance-based = a point is anomalous if distant from other points
density-based = a point is anomalous if in low density region
LOF(q) = ratio of average local reachability density of q’s k-nn and local reachability density of q
if LOF(q) < 1 => higher density => inlier
if LOF(q) > 1 => lower density => outlier
Clustering based
normal data records belong to large and dense clusters
anomalies do not belong to any cluster or form very small clusters
local anomalies are distant from all other points in the same cluste
Clustering based
normal data records belong to large and dense clusters
anomalies do not belong to any cluster or form very small clusters
local anomalies are distant from all other points in the same cluster
Spectral techniques
PCA - outliers have variability in the smallest Principle Component (datapoints that vary in unexplained dimensions are anomalous)
Autoencoder - encode data into low dimension, decode it back to high dimension, see difference between original and reconstructed data
Sequential (discrete) processes
Markov processes - dependent only on last action
state transition diagrams - if n states possible, nxn matrix describing probability of going from 1 state to another (remember laplace smoothing)
ngrams - instead of future depending only on last action, it can depend on last 2/3/4/… actions => process in sequences