Anomaly Detection + Distances Flashcards
What is DTW and how does it work?
Dynamic Type Warping or sequence alignment aligns time-series non-linearly in time, and finds the best match.
How can you create an Isolation forrest?
Repeat N times:
* Randomly pick a feature (dimension) f
* Split said f at random between [min, max]
* Continue until all leafs contain singletons
* The path length to reach a leaf is the isolation score
* Average this length over all trees to get the anomaly score
What are the two main ways to approach anomaly detection?
- Distance-based: a point is anomalous when it is far from other points
- Density-based: a point is anomalous when it is in a low density region
How to compute local reachability density (lrd)?
lrd = #nn/sum of distances from point to the neighbours
How to compute local outlier factor (lof)?
lof = (1/#nn) * (sum of lrd of neighbours)/lrd of point
What is a residual?
The difference between the expected and real value
What is PCA
Principal Component Analysis reduces the dimension of data by finding the principal vectors that capture most of the data’s patterns
Why is PCA useful for anomaly detection?
Outliers have variability in the smallest component (that should be constant)
The set of feature it finds should explain most of the variance. If Datapoints vary in unexplained dimensions are anomalous.
How does dimensionality reduction find outliers in general?
Abnormal data will map to abnormal code, so when reconstructing, this will give us a more clear view of the anomalies.
What is edit distance?
The amount of changes required to a series to be equal to another series.
What are the properties of a distance metric?
d > 0
d(a,b) = 0 iff a == b
d(a,b) = d(b,a)
d(a,c) <= d(a,b) + d(b,c)
How does K-means work?
Randomly choose cluster centers for how many clusters you want.
Find the nearest neighbours -> compute centroid and assign each observation to the cluster with the closest center to the observation.
Iterate until the cluster assignments stop changing.
How does hierarchical clustering work?
Treat each observation as a cluster.
Find the closest two clusters and merge. Repeat until all points belong to one cluster.
Create dendrogram -> remove links from top until you have as many clusters as you want.
What are the types of cluster distance?
Single linkage: distance between closest points from the different clusters
Compllete linkage: distance between the furthest points in the clusters
Average linkage: average distance between all pairs of points.
Ward distance: difference between the total within cluster sum of squares for the two clusters separately, and the within cluster ss result from merging the two clusters in one cluster
Does hierarchical clustering need the distance to be a metric?
Single, complete, and average linkage only require symmetry and non-negative properties
Ward and centroid-based require euclidean distance since they minimize squared error.
How does DBSCAN work?
Classify points as Core, Border, and Noise, based on a radius e and a t expected neighbours.
Core: if number of points at distance e or less is >= t but has core in circle
Border: if -//- is < t but has no core in circle
Only requires symmetry and non-negative
Graph based (spectral) clustering
Construct neighbourhood graph G, based on the distance of the points being below a threshold, or nearest neighbors. Set the weight on the edges as the distance. Find communities using graph mining. Return found communitie as clusters.
When is a clustering considered good in terms of inter and intra - cluster distances?
When the Intra/Inter ratio is small.
What is CURE?
Cluster using representatives is useful for non-centroid clustering. Since is it very expensive, we can select a couple of points to represent a cluster (prototyping)
What does BFR do?
- aimts to minimize time and space required by l-means
- points can be assigned to three sets:
> discard set - assigned to a pre-existing cluster
> compressed set - mini-clusters
> retained set - outliers, points that do not belong to any cluster
What do we require for a cluster in BFR?
- N - the number of data points in the cluster
- SUM - a vector containing with sums of all feature values
- SUMSQ - a vector with sums of squared feature values
With these we can keep track of the centroid of the cluster, and compute a threshold for cluster membership.
What are the 3 types of anomalyes?
Point: a singular datapoint is anomalous w.r.t to the rest of the data.
Contextual: Detectable depending on context. Using sliding windows and the assumption that the next datapoint should be close to the previous, large leaps are considered anomalous
Collective: The individual instances within the collective anomaly are not anomalour by themselves. They require a relationship among data instances (sequential, spatial, graph). Also detected with sliding windows.
What is hamming distance?
The number of different bits in two bit strings.
What does the silhouette 1 and -1 values correlate with.
1 is good clustering, -1 is mixed