Clustering Flashcards
What topics are under dissimilarities?
Continuous dissims.
Binary dissims.
3 Rules
Dissim matricies
What are two types of continuous dissims?
Euclidean
Manhattan
What are two types of binary dissims?
Simple
Jaccard
What is the Simple binary dissimilarity equation?
d(x,y)=(a+d)/(a+b+c+d)
What is the Jaccard binary dissimilarity equation?
d(x,y)=a/(a+b+c)
What are the three rules for dissimilarities?
d(x,y)>=0, if x=y then d=0
d(x,y)=d(y,x)
d(x,y)>=d(x,z)+d(x,y)
What are the key features of dissimilarity matricies?
0’s along the diagonal
Usually factors are standardized
Usually factors are scaled
Symmetric
What topics are important to hierarchical clustering?
5 steps
Linkage
Chaining
Number of groups
What are the five steps in hierarchical clustering?
- Each obs. into its own group
- Pair nearest obs.
- One less group now!
- Pair next nearest obs.
- Repeat until no individual obs left
What types of linkage are there?
Simple
Complete
Average
What is the formula for simple linkage?
d(A,B)=min x€A,y€B,d(x,y)
What is the formula for complete linkage?
d(A,B)=max x€A,y€B,d(x,y)
What is important to know about the number of groups in hierarchical clustering?
The dendrogram should be cut along the y-axis, this divides the total obs. into groups below the line.
What is chaining?
Where observations are individually appended to the same group one after another -results in poor model.
What topics are under partitioning methods?
6 steps
K-means
Starting position
Number of groups