05 SNA Graph Clustering Flashcards
State the central paradigm for formulating a quality measure for clustering (target
function for optimization)! (in words)
Objective function A (g) -> R that formalizes the clustering paradigm in a special
way
Coverage is defined as:
State a main disadvantage of using f(C) = γ(C) in a quality measure for clustering
- Only accumulated intra cluster density is measured
- Monotonic behaviour of coverage -> coverage is not a good sole quality index
Explain the pragmatics of the following two conductance-based quality measures! (1-2
sentences each)
- First measure: If the measure is small, at least one of the clusters (more precisely: the induced subgraph) contains at least one bottleneck -> this cluster is too coarse -> use minimum conductance cut to cut this cluster in “halves”
- Second measure: If the measure is small, at least one of the clusters (more precisely: the induced subgraph) has many connections to outside -> the clustering is too fine -> merge clusters
Explain the pragmatics of the quality measure Performance!
- Count „correctly classified pairs of nodes”.
- Correctly classified: s r and connected by edge (f counts the number of
edges within clusters) OR not same cluster and not connected by edge (g counts the number of non-existent edges between clusters) - Calculating maximum of f+g is NP-hard (would be optimal clustering)
In case a density measure π on graphs is available, a quality function can be constructed
by pessimistically setting
What are analogous expressions for average case and best case (optimistic view)?
In case a density measure π on graphs is available, a quality function can be constructed
by pessimistically setting
Suggest a simple density measure π on graphs!
Informally explain the difference between Linkage-based (Agglomerative) and Splittingbased (Divisive) hierarchical clustering! (1-2 sentences)
Linkage (Agglomeration): Iteratively coarsens a given clustering by merging two
clusters until 1-clustering is reached (“bottum up”)
Splitting (Division): Iteratively refines a given clustering by splitting one cluster
until singleton clustering is reached(“top down”)
Explain the difference between Complete Linkage and Single Linkage by stating the local
cost functions for both cases and giving 1 sentence of explanation!
Single Linkage defines the local merging cost by looking at the distance between
the closest elements in between the respective clusters, complete link defines the
local merging cost by looking at the distance between the farthest elements in
between clusters
What is the benefit of a cut-function when using Splitting-based (Divisive) hierarchical
clustering? (1-2 sentences)
Cut function avoids having to test all possible split
Newman ± Girvan method: Explain Modularity! (2-3 sentences)
Tre is the fraction of edges within communities, e is the fraction of edges between
communities. The ultimate goal is to have many edges within communities, but
few between them. Modularity tells us how well we achieved that goal