Anomaly Detection Flashcards
DTF never results in distance strictly greater than Euclidean distance
True
DTF cannot be applied to sequences of diff length
False
DTF can only be applied to single-variate (one-dimensional/ one feature) sequence
False
DTF normalization is useful
True
What does a low local reachability density mean? (lrd)
It means large average distance
LOF(q) < 1 means what?
Inlier, higher density
LOF(q) > 1 means what?
Outlier, lower density
Advantages of NN?
- used in unsupervised setting
- no assumptions about data distribution
- intuitively appealing, uses distances
Disadvantages of NN?
- computationally expensive when testing
- requires distances, so all disadvantages of distances apply
Advantages of PCA?
- Useful for modeling feature interaction
- Computationally efficient
Disadvantages of PCA?
- Based on assumption that normal/ anomaly are distinguishable in the reduced space
- Context not taken into account
- PCA sensitive to outliers
What are the three types of anomalies?
- Point (point x is strange)
- Contextual (point x strange given set S)
- Collective (set S is strange)
Outliers have no effect on PCA?
False
PCA assumes relationship between variables is linear?
True
LOF uses reachability distance instead of actual distance to lower effect of outliers?
False
LOF does not require distance metric to work properly and return sensible results?
True
If p, q have same distances to nearest neighbours, it is possible that LOF returns p as anomaly and q as normal?
True
Main use of LOF is to find collective anomalies?
False (I think its point anomalies)
What is NN not suitable for?
Datasets that have modes with varying density
PCA assumptions?
-relationship between variables/ features are linear
- principle components are orthogonal (linearly independent)
- direction with largest variance is the most informative
Is DTW scale invariant?
No