06 Data Mining / Metric Clustering Flashcards
Explain briefly the following three characterizations for clustering-approaches / methods:
- Exclusive vs. non-exclusive
- Crisp vs. fuzzy
- Hierarchical vs. non-hierarchical
- Exclusive: non overlapping clusters
- non-exclusive: overlapping clusters
- Crisp: Conventional characteristic functions α_k: X -> {0, 1} for each cluster C_k
(element is either in cluster or not) - Fuzzy: fuzzy membership function α_k: X -> [0, 1] for each cluster C_k (element
belongs to multiple clusters with different weight) - Hierarchical: imposes tree structure (Dendrogram) on C_k
- Non-hierarchical: doesn’t impose tree structure on C_k
The local cost functions for Single Linkage (min) and Complete Linkage (max)
are defined as **(pic) **where d (u, v) denotes the
length of the shortest path between nodes u and v. State the analogous local cost
functions if nodes correspond to pattern vectors x_u and x_v in ℝ^n!
State and explain two advantages of DBSCAN (compared to K-Means)! (1 sentence each)
- DBSCAN does not need to know K in advance, K-means does, which makes
DBSCAN easier to compute - Arbitrarily shaped clusters (DBSCAN) instead of spherical clusters (K-means)
(improved clustering quality) - DBSCAN has notion of noise, K-means doesn’t (improved clustering quality)
Fuzzy C-Means:
a) What is the range of values of the entries membership matrix r_nk compared to KMeans?
b) Fuzziness-parameter m in the objective function: Informally: what happens in the limits
m -> 1 and m-> infinity? (No full mathematical derivation necessary!)
c) Explain the pragmatics behind the (following) additional conditions
- a) Range of values is between [0,1] instead of pure {0, 1}
(K-means is crisp, fuzzy C-means is fuzzy) - b) m models degree of fuzziness:
m -> 1: K-Means (crisp case)
m -> ∞: r_nk -> 1/K (where K is the number of clusters) (totally fuzzy) - c) The first condition makes it so that the membership of an element to the
clusters can be seen as percentages and therefore as probabilities as well. The
second condition makes sure that every cluster has at least one element has a
“chance” to be in cluster C_k (in other words no cluster is useless)
What are advantages of Gaussian Mixture Models compared to Fuzzy C-Means?
Informally explain the nature and geometrical interpretation of the GMM quantities μ_k
and Σ_k!
- GMM does not favor spherical clusters like Fuzzy C-Means -> better clustering
quality - μ_k: mean vectors
- Σ_k: covariance matricies
What is the paradigm of the „Maximum Likelihood concept “?
- It’s a method of estimating the parameters of a probability distribution by
maximizing a likelihood function - Finding the parameters that best explain the data
What does „iid“ mean? What are the consequences here? Can You point them out in an
expression on these slides?
- Iid: “identically independently drawn” ->
Why do we take the logarithm of the likelihood? How do we choose the base of the
logarithm? Why?
- We use the monotonicity of the logarithm to get the maximum of the function (first
derivation = 0). This is a lot easier with log and we get the same position. - The base is e because the likelihood function is exponential
Case ONE Gaussian: Are the resulting equations directly solvable?
- No because the equations depend on each other and form an “infinite loop”
What is a 1 of K representation?
It is a vector which has an x_i = 1 and for all other x_j ≠ x_i: x_j = 0. (There are k
different possibilities)
Can You explain the formula for the responsibilities from an intuitive point of view?
- How much is cluster k responsible for element x
- Probability that element x belongs to cluster k
Why don’t we have the full joint probability distribution p(X, Z | θ)?
- We don’t know the full data set {X, Z}