Lecture 8 - Unsupervised Learning I Flashcards
What do patterns tell?
Patterns describe or summarize the data set or parts of it
What is cluster analysis?
Identifying groups of “similar” data objects?
What are association rules?
Finding associations between attributes or typical combinations, like demand = high, supply = low then price = high
What is deviation analysis?
Finding groups that deviate from the rest of the data, like men under 30 differ from the whole dataset.
What is hierarchical clustering?
Hierarchical clustering builds clusters step by step
What is agglomerative hierarchical clustering?
A bottom up strategy by first considering each data object as a separate cluster and then step by step joining clusters together
What is divisive hierarchical clustering?
Starting with the whole data set as a one cluster and then dividing it to smaller ones. Seldom used, because first step has 2^(n-1) steps
What is isotrophic distance?
Distance grows equally fast in all directions (like euclidean)
What is nonisotrophic distance?
Distances have different weightings for different directions
What are the b, n and x in the dissimilarity measures?
b = hold in both records, n = do not hold in both records, j = hold in only one of both records
What is single linkage?
Dissimilarity between the two most similar data objects (so two closest ones connected)
What is a complete linkage?
Dissimilarity between two most dissimilar data objects (so two that are furthest away connect)
What is average linkage?
Average dissimilarity between two points of two clusters
What is centroid linkage?
Distance between two centroids (mean value vectors)
What are dendrograms?
The cluster merging process arranges data points in a binary tree, when drawing the data tuples at the bottom, draw a connection between clusters that are merged with the distance to the data points, then cut the dendogram at specific point to get the clusters for that distance