5: PCA and Cluster Analysis Flashcards

1
Q

What is the advantage of a dendrogram?

A

We can evaluate the clusterings obtained for each possible number of clusters from 1 to n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which two clustering methods are the two most common?

A

Hierarchical and K-means clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the idea of K-means clustering?

A

The idea is to partition clusters by assigning each data point to the nearest centroid, based on a pre-defined number of clusters (k). It aims to minimize the sum of squared distances between data points and their respective centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are two major disadvantages of K-means clustering?

A

1) The algorithm will force all observations into a cluster, regardless of hor “far” that observation is from some other observations, 2) we need to pre-specifiy the number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between agglomerative and divisive clustering?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can agglomerative and divisive clustering be used for both hierachical and K-means?

A

No, divisive and agglomerative clustering are two distinct approaches to hierarchical clustering. K-means does not fit into either category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which two properties need to be fulfilled in K-means?

A

1) All observations need to belong to at least one cluster, 2) clusters must be non-overlapping.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the objective in K-means?

A

To partition the observations into K clusters such that the total within-cluster variation, summed over all K, clusters, is as small as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is cluster analysis a supervised or unsupervised learning method?

A

Unsupervise (for its lack of a class label or a quantitative response variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is common for all linkage methods (single, complete, average)?

A

That we base the clusters on the minimum distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is it important to standardize the variables before performing cluster analysis?

A

Standardization prevents variables with larger scales from dominating how clusters are defined. It alllows all variables to be considered by the algorithm with equal importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly