Unsupervised Learning Flashcards

1
Q

What is the optimal number of clusters in K-means clustering?

A

The determination of K is subjective and there does not exist one method to determine the optimal number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the number of distinct principal components for any given dataset?

A

MIN(n-1,p), where n=# of observations, and p=number of non-intercept explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F: If K is held constant, K-means clustering will always produce the same cluster assignments.

A

False. K-means is subject to the random initial assignment of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F: Given a linkage and a dissimilarity measure, hierarchical clustering will always produce the same cluster assignments for a specific number of clusters.

A

True. Hierarchical clustering is deterministic, not requiring a random initial assignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: Given identical data sets, cutting a dendrogram to obtain five clusters produces the same cluster assignments as K-means clustering with K=5.

A

False. The two methods differ is their approaches and hence may not yield the same clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: n observations can be clustered on the basis of the p features to identify subgroups among the observations AND p features can be clustered on the basis of the n observations to identify subgroups among the features.

A

True. Both are viable methods of clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: Euclidean distance focuses on the magnitude of observation profiles rather than their shape.

A

True. Euclidean distance focuses on the magnitude of observation profiles, while correlation-based distance focuses on their shape.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: Two observations are said to be similar if they have a large correlation-based distance.

A

False. If two observations have a large correlation-based distance, it means that they are not similar. i.e. The larger, the more dissimilar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: Standardizing the variables does not affect the result of hierarchical or k
-means clustering.

A

False. Standardizing variables greatly affects the result of clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly