Unsupervised Learning Flashcards

Question 1

Q

What is the optimal number of clusters in K-means clustering?

Answer

A

The determination of K is subjective and there does not exist one method to determine the optimal number of clusters.

Question 2

Q

What is the number of distinct principal components for any given dataset?

Answer

A

MIN(n-1,p), where n=# of observations, and p=number of non-intercept explanatory variables.

Question 3

Q

T/F: If K is held constant, K-means clustering will always produce the same cluster assignments.

Answer

A

False. K-means is subject to the random initial assignment of clusters.

Question 4

Q

T/F: Given a linkage and a dissimilarity measure, hierarchical clustering will always produce the same cluster assignments for a specific number of clusters.

Answer

A

True. Hierarchical clustering is deterministic, not requiring a random initial assignment.

Question 5

Q

T/F: Given identical data sets, cutting a dendrogram to obtain five clusters produces the same cluster assignments as K-means clustering with K=5.

Answer

A

False. The two methods differ is their approaches and hence may not yield the same clusters.

Question 6

Q

T/F: n observations can be clustered on the basis of the p features to identify subgroups among the observations AND p features can be clustered on the basis of the n observations to identify subgroups among the features.

Answer

A

True. Both are viable methods of clustering.

Question 7

Q

T/F: Euclidean distance focuses on the magnitude of observation profiles rather than their shape.

Answer

A

True. Euclidean distance focuses on the magnitude of observation profiles, while correlation-based distance focuses on their shape.

Question 8

Q

T/F: Two observations are said to be similar if they have a large correlation-based distance.

Answer

A

False. If two observations have a large correlation-based distance, it means that they are not similar. i.e. The larger, the more dissimilar.

Question 9

Q

T/F: Standardizing the variables does not affect the result of hierarchical or k
-means clustering.

Answer

A

False. Standardizing variables greatly affects the result of clustering.

Unsupervised Learning Flashcards

(9 cards)