week 3 Flashcards

1
Q

anomalies

A

point anomaly
contextual (sequential) anomaly - anomalous within a context, use sliding windows, learn model for predicting next value, compute expected next value, evaluate residual using real next value, use decision threshold to decide if it is anomaly
collective anomaly - individual points are not anomalous by themselves, treat set of points as datasets, compare these sets (kind of like a sliding window)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

classification

A

OSVM - maximize negative space, minimize positive space
Isolation Forest - repeat N times - pick feature f, split randomly, continue until all leaves contain singletons, path length to leaf = isolation score, average isolation score over all trees => anomaly score (goal is to isolate anomalies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NN based

A

distance-based = a point is anomalous if distant from other points
density-based = a point is anomalous if in low density region
LOF(q) = ratio of average local reachability density of q’s k-nn and local reachability density of q
if LOF(q) < 1 => higher density => inlier
if LOF(q) > 1 => lower density => outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clustering based

A

normal data records belong to large and dense clusters
anomalies do not belong to any cluster or form very small clusters
local anomalies are distant from all other points in the same cluste

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Clustering based

A

normal data records belong to large and dense clusters
anomalies do not belong to any cluster or form very small clusters
local anomalies are distant from all other points in the same cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spectral techniques

A

PCA - outliers have variability in the smallest Principle Component (datapoints that vary in unexplained dimensions are anomalous)
Autoencoder - encode data into low dimension, decode it back to high dimension, see difference between original and reconstructed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sequential (discrete) processes

A

Markov processes - dependent only on last action
state transition diagrams - if n states possible, nxn matrix describing probability of going from 1 state to another (remember laplace smoothing)
ngrams - instead of future depending only on last action, it can depend on last 2/3/4/… actions => process in sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly