Unsupervised Learning Flashcards
What is the optimal number of clusters in K-means clustering?
The determination of K is subjective and there does not exist one method to determine the optimal number of clusters.
What is the number of distinct principal components for any given dataset?
MIN(n-1,p), where n=# of observations, and p=number of non-intercept explanatory variables.
T/F: If K is held constant, K-means clustering will always produce the same cluster assignments.
False. K-means is subject to the random initial assignment of clusters.
T/F: Given a linkage and a dissimilarity measure, hierarchical clustering will always produce the same cluster assignments for a specific number of clusters.
True. Hierarchical clustering is deterministic, not requiring a random initial assignment.
T/F: Given identical data sets, cutting a dendrogram to obtain five clusters produces the same cluster assignments as K-means clustering with K=5.
False. The two methods differ is their approaches and hence may not yield the same clusters.
T/F: n observations can be clustered on the basis of the p features to identify subgroups among the observations AND p features can be clustered on the basis of the n observations to identify subgroups among the features.
True. Both are viable methods of clustering.
T/F: Euclidean distance focuses on the magnitude of observation profiles rather than their shape.
True. Euclidean distance focuses on the magnitude of observation profiles, while correlation-based distance focuses on their shape.
T/F: Two observations are said to be similar if they have a large correlation-based distance.
False. If two observations have a large correlation-based distance, it means that they are not similar. i.e. The larger, the more dissimilar.
T/F: Standardizing the variables does not affect the result of hierarchical or k
-means clustering.
False. Standardizing variables greatly affects the result of clustering.