Wronged Questions: Unsupervised learning Flashcards

Question

T/F: It's generally necessary to execute the algorithm multiple times based on the final number of clusters chosen for hierarchal clustering.

Answer 1

False. It is only necessary to execute the algorithm a single time, regardless of how many clusters are ultimately decided to use. One single dendrogram can be used to obtain any number of clusters.

Answer 2

False. Bottom-up or agglomerative clustering is the most common type of hierarchical clustering.

Answer 3

True. We cannot draw conclusions about the similarity of two observations based on their proximity along the horizontal axis. Rather, we draw conclusions about the similarity of two observations based on the location on the vertical axis where branches containing those two observations first are fused.

Answer 4

False. K-means clustering aims to minimize the average squared distance within clusters.

Answer 5

False. K-means clustering requires pre-specification of the number of clusters

Answer 6

False. Clustering looks to find homogeneous subgroups among the observations.

Answer 7

True. The idea behind K-means clustering is that a good clustering is one for which the within-cluster variation is as small as possible.

Answer 8

False. K-means algorithm finds a local optimum rather than a global optimum.

Answer 9

False. Principal components are uncorrelated with one another. This means that they have a correlation of 0. With zero correlation, the variance inflation factor would be 1.

Answer 10

False. PCA is an unsupervised learning tool. The result of PCA has no relationship to the model coefficients of a parametric learning method, such as multiple linear regression.

Answer 11

False. Principal components are constructed in an unsupervised manner, so the principal components that best explain the predictors are not guaranteed to also be the best at explaining the target variable.

Answer 12

False. Rerunning the algorithm could produce different cluster results for k-means clustering.

Answer 13

False. The choice to standardize variables has a significant impact on the cluster results for both methods, as for example, it changes the Euclidean distances between observations.

Answer 14

False. Hierarchical clustering is not robust. Changing the dataset slightly could result in extremely different clusters.

Answer 15

False. One benefit to hierarchical clustering is the ability to visualize the clusters using a dendrogram.

Answer 16

False. K-means clustering and hierarchical clustering each have their own objectives; one is not always better than the other.

Answer 17

True. It is more popular to use the agglomerative approach, meaning we start with n clusters and fuse them one at a time until there is one cluster.

Answer 18

False. The number of clusters does not need to be pre-specified. In fact, we can choose any number of clusters after running the algorithm once.

Answer 19

False. Even if these two methods result in the same number of clusters, it is highly unlikely that they produce the same cluster assignments.

Answer 20

False. In K-means clustering, the results depend on the initial cluster assignment.

Answer 21

False. At each iteration of the hierarchal clustering, the number of clusters decreases by one.

Answer 22

False. The decision to standardize the variables depends heavily on the problem at hand.

Answer 23

False. The choice of initial centroids in K-means clustering can significantly affect the final clustering outcome due to the algorithm's susceptibility to local optima.

Answer 24

True. Different linkage methods, such as single, complete, or average linkage, use various approaches to measure the dissimilarity between sets of observations, leading to different cluster formations.

Answer 25

True. K-means relies on calculating the mean for centroids during the clustering process, and calculating means is not meaningful or directly possible with categorical data.

Answer 26

True. Only hierarchical clustering provides a visual representation of cluster formation over iterations through a dendrogram.

Answer 27

False. This is because K-means clustering begins with a random assignment of the data points into K clusters. This means the two people will get different results on the first iteration.

Answer 28

False. Both K-means clustering and hierarchical clustering force every observation into a cluster. This means that the clusters found may be heavily distorted by outliers that do not belong to any cluster.

Answer 29

False. Complete and average linkage are generally preferred because their dendrograms are more balanced.

Answer 30

False. We do not choose K the way we choose B in cross validation. We choose K first and then aim to minimise total within-cluster variation.

Answer 31

True. You can only use absolute correlation on datasets with 3 or more features.

Answer 32

True. Euclidean distance focuses on magnitude of the observation while correlation abased distance focuses on the shape.

Answer 33

False. If the number of principal components equals the number of original variables, data approximation is exact.

Answer 34

True. Cutting a dendrogram at a lower height will increase the number of clusters.

Answer 35

True. The nesting restriction means that hierarchical clustering can yield less accurate cluster assignments.

Answer 36

False. The use of PCs in supervised learning allows for the number of components to be objectively optimized through cross-validation.

Wronged Questions: Unsupervised learning Flashcards

(77 cards)