Multi-Choice: Clustering Flashcards
Q1 (Clustering). Main difference between K-means and GMM? (One choice) 1) One is supervised 2) One allows partial membership, the other doesn’t 3) One can’t do high-dim data 4) One doesn’t need initialization
Correct item: 2. Explanation: GMM can produce soft assignments; K-means is hard assignment.
Q2 (Clustering). GMM generative viewpoint: X has N samples each of D>1. Which is correct? (One choice) 1) p(X|π)=ΣₙΣₖ πₖp(xₙ) 2) p(X|π,µ,Σ)=∏ₙ N(xₙ|µ,Σ) 3) p(X|π,µ,Σ)=∏ₙ Σₖ πₖN(xₙ|µₖ,σ²) 4) p(X|π,µ,Σ)=∏ₙ Σₖ πₖN(xₙ|µₖ,Σₖ)
Correct item: 4. Explanation: GMM is a mixture of Gaussians in multiple dimensions.
Q3 (Clustering). Number of free parameters in GMM with K components, data dimension D? (One choice) 1) K·D + (K-1) + K·(D²·(D+1)/2) 2) K·D + (K-1) 3) K·D + (K-1)+K·(D²·D) 4) 3·K·D
Correct item: 1. Explanation: Each component has a mean (D), covariance (D(D+1)/2), plus mixing weights (K-1).
Q4 (Clustering). Which statements are true? (Two correct) 1) K=5 => cost≥ cost for K=7 2) A more complex model always better for clustering 3) EM for GMM always finds global max 4) SON with λ→∞ => single cluster (K=1)
Correct items: 1 and 4. Explanation: More clusters never increase the minimal cost, and SON lumps everything into one cluster if λ→∞.
Q5 (Clustering). Which statements are true for K-means? (Two correct) 1) Needs >10 points per cluster 2) SON relaxation still needs K chosen 3) K-means solution found only by iterative method 4) K-means++ is an effective init method
Correct items: 3 and 4. Explanation: K-means typically uses iterative optimization, and K-means++ helps with better initialization.
Q6 (Clustering). GMM with 2 components in 1D: p(x)=Σₖ πₖN(x|µₖ,σₖ). π₁=0.6,π₂=0.4,µ₁=4,µ₂=7,σ₁=4,σ₂=3. Likelihood of dataset X={6,7,3}? (One choice) 1)0.02 2)0.0001 3)0.006 4)0.0008
Correct item: 4. Explanation: Direct calculation from each mixture component ~ 0.0008.
Q7 (Clustering). You have a GMM with data-likelihood p(X|θ)=0.01 for N=200, 2 components. BIC= -ln(0.01)+(1/2)cₖ ln(200). cₖ=11. (One choice) 1)33.7 2)20.2 3)17.9 4)8.4
Correct item: 1. Explanation: BIC= -ln(0.01) + 0.5×11×ln(200) ~ 33.7.
Q8 (Clustering). GMM in 1D with 2 components: π₁=0.6,π₂=0.4,µ₁=4,µ₂=7,σ₁=2,σ₂=3. Posterior for x=5? (One choice) 1) γ(z₁)=0.81,γ(z₂)=0.10 2) γ(z₁)=0.81,γ(z₂)=0.19 3) γ(z₁)=0.71,γ(z₂)=0.29 4) γ(z₁)=0.42,γ(z₂)=0.58
Correct item: 3. Explanation: Weighted Gaussians => about 0.71 for component 1, 0.29 for component 2.
Q9 (Clustering). K-means step with x1=(1,1),x2=(2,3),x3=(1,2). µ1=(1,1),µ2=(3,3). Perform one iteration. Updated centroids? (One choice) 1)µ1=(1,1.5),µ2=(2,3) 2)µ1=(1,1),µ2=(1.5,2.5) 3)µ1=(1,2),µ2=(2,3) 4)µ1=(1,1.5),µ2=(1.5,2.5)
Correct item: 1. Explanation: (1,1) & (1,2) go to cluster1; (2,3) alone in cluster2. Means => (1,1.5) & (2,3).
Q10 (Clustering). K-means with W(K) values: W(2)=200, W(4)=150, W(6)=100, W(8)=90. Best #clusters by elbow? (One choice) 1)2 2)4 3)6 4)8
Correct item: 3. Explanation: The largest drop is around K=6.