Wronged Questions: Unsupervised learning Flashcards

1
Q

If two observation profiles are close together along the vertical axis, they have a ___________.

A

Small euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If two observation profiles have a similar shape, they have a _____________.

A

Small correlation-based distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

________ can happen when centroid linkage is used.

A

Inversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F: Hierarchal Clustering may not assign extreme outliers to any cluster

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: The resulting dendrogram can be used to obtain different numbers of clusters.

A

True. Depending on the height where we cut the dendrogram, we get the cluster assignments for different #’s of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: Hierarchal clustering is not robust to small changes in the data.

A

True. Small changes in the data can result in different cluster assignments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The loadings for the first PC are based on the ___ axis.

A

Top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The loadings for the second PC are based on the _____ axis.

A

Right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: Using more principal components in a PCR model generally leads to a decrease in the model’s variance.

A

False. It leads to an increase in model variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T/F: The incorporation of additional principal components tends to increase the model’s squared bias.

A

False. It leads to a decrease in squared bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T/F: PCR becomes identical to ordinary least squares regression when all principal components are employed.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two unsupervised learning methods

A

Cluster analysis, PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: The maximum number of principal components that can be extracted from this data is three if the data has 3 independent continuous predictors.

A

True. The maximum number of principal components that can be extracted from this data is three.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: The loadings are constrained so that their sum of squares is equal to zero, since otherwise setting these elements to be arbitrarily large in absolute value could result in an arbitrarily large variance.

A

False. The loadings are constrained so that their sum of squares is equal to one, since otherwise setting these elements to be arbitrarily large in absolute value could result in an arbitrarily large variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: The principal component loading vectors represent the directions in feature space along which the data vary the most, while the principal component scores are the projections of the data along these directions.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F: Principal components provide low-dimensional linear surfaces that are closest to the observations.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F: The first principal component loading vector is the line in p- dimensional space that is closest to the n observations.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F: Together the first M principal component score vectors and the first M principal component loading vectors provide the best M-dimensional approximation to the i-th observation.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T/F: A maximum of max(n-1, p) distinct principal components can be created from a dataset with n observations and p features.

A

False. A minimum of min(n-1, p) distinct principal components can be created from a dataset with n observations and p features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

T/F: For KNN, running the algorithm once is guaranteed to find clusters with the global minimum of the total within-cluster variation.

A

False. This is a benefit of hierarchal clustering, not k-means clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

T/F: For KNN, the clusters do not have to be nested upon changing the desired number of clusters.

A

True. Hierarchical clustering must produce nested clusters as a function of the number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

T/F: For KNN, there are fewer areas of consideration in clustering a dataset in comparison to hierarchal clustering.

A

True. K-means clustering requires pre-specifying the number of clusters, whereas hierarchical clustering requires choosing a measure of dissimilarity, choosing a linkage, and deciding on the number of clusters (i.e. at what height to cut the dendrogram).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

T/F: Average linkage clustering can lead to the formation of extended, trailing clusters.

A

False. Single linkage clustering can lead to the formation of extended, trailing clusters in which single observations are fused one-at-a-time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T/F: The number of possible ways to reorder a dendrogram is 2^n, with n representing the total number of leaves.

A

False. The number of possible ways to reorder a dendrogram is 2^(n-1), with n representing the total number of leaves. This is because at each of the n-1 points where fusions occur, the positions of the two fused branches could be swapped without affecting the meaning of the dendrogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

T/F: It’s generally necessary to execute the algorithm multiple times based on the final number of clusters chosen for hierarchal clustering.

A

False. It is only necessary to execute the algorithm a single time, regardless of how many clusters are ultimately decided to use. One single dendrogram can be used to obtain any number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

T/F: Agglomerative clustering is the least common type of hierarchical clustering.

A

False. Bottom-up or agglomerative clustering is the most common type of hierarchical clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

T/F: We cannot draw conclusions about the similarity of two observations based on their proximity along the horizontal axis.

A

True. We cannot draw conclusions about the similarity of two observations based on their proximity along the horizontal axis.

Rather, we draw conclusions about the similarity of two observations based on the location on the vertical axis where branches containing those two observations first are fused.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

T/F: K-means clustering aims to minimize the average distance within clusters.

A

False. K-means clustering aims to minimize the average squared distance within clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

T/F: K-means clustering does not require pre-specification of the number of clusters.

A

False. K-means clustering requires pre-specification of the number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T/F: K-means clustering looks to find heterogeneous subgroups among the observations.

A

False. Clustering looks to find homogeneous subgroups among the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

T/F: A good clustering is one for which the within-cluster variation is as small as possible.

A

True. The idea behind K-means clustering is that a good clustering is one for which the within-cluster variation is as small as possible.

32
Q

T/F: K-means algorithm finds a global optimum.

A

False. K-means algorithm finds a local optimum rather than a global optimum.

33
Q

T/F: Principal components have a variance inflation factor of greater than 1.

A

False. Principal components are uncorrelated with one another. This means that they have a correlation of 0. With zero correlation, the variance inflation factor would be 1.

34
Q

T/F: The principal components represent the original variables with the greatest model coefficients.

A

False. PCA is an unsupervised learning tool. The result of PCA has no relationship to the model coefficients of a parametric learning method, such as multiple linear regression.

35
Q

T/F: When performing PCR, it is recommended to standardize each predictor prior to generating the principal components.

A

True

36
Q

T/F: In the absence of standardization, high-variance variables tend to play a larger role in the principal components obtained.

A

True

37
Q

T/F: PCR guarantees that the directions that best explain the predictors will also be the best directions to use for predicting the target.

A

False. Principal components are constructed in an unsupervised manner, so the principal components that best explain the predictors are not guaranteed to also be the best at explaining the target variable.

38
Q

T/F: Choosing a linkage is necessary for hierarchal clustering.

A

True

39
Q

T/F: Rerunning the algorithm could produce different cluster results for hierarchal clustering.

A

False. Rerunning the algorithm could produce different cluster results for k-means clustering.

40
Q

T/F: The choice to standardize variables has a significant impact on the cluster results for hierarchal clustering but not k-means.

A

False. The choice to standardize variables has a significant impact on the cluster results for both methods, as for example, it changes the Euclidean distances between observations.

41
Q

T/F: Hierarchal clustering is robust.

A

False. Hierarchical clustering is not robust. Changing the dataset slightly could result in extremely different clusters.

42
Q

T/F: Hierarchal clustering cannot be depicted using a dendrogram.

A

False. One benefit to hierarchical clustering is the ability to visualize the clusters using a dendrogram.

43
Q

T/F: Hierarchal clustering is universally better than K-means clustering.

A

False. K-means clustering and hierarchical clustering each have their own objectives; one is not always better than the other.

44
Q

T/F: Hierarchal clustering is commonly a bottom-up or agglomerative approach.

A

True. It is more popular to use the agglomerative approach, meaning we start with n clusters and fuse them one at a time until there is one cluster.

45
Q

T/F: Hierarchal clustering requires determining the number of desired clusters before running the algorithm.

A

False. The number of clusters does not need to be pre-specified. In fact, we can choose any number of clusters after running the algorithm once.

46
Q

T/F: K=m, both k-means and hierarchal result in the same cluster assignments.

A

False. Even if these two methods result in the same number of clusters, it is highly unlikely that they produce the same cluster assignments.

47
Q

T/F: The resulting cluster assignments for k-means clustering will be the same regardless of the initial cluster assignment.

A

False. In K-means clustering, the results depend on the initial cluster assignment.

48
Q

T/F: At each iteration of the algorithm for hierarchal clustering, the number of clusters is greater than the number of clusters in the previous iteration of the algorithm by one.

A

False. At each iteration of the hierarchal clustering, the number of clusters decreases by one.

49
Q

T/F: The K-means clustering algorithm requires that the observations be standardized to have mean zero and standard deviation one.

A

False. The decision to standardize the variables depends heavily on the problem at hand.

50
Q

T/F: K-means clustering’s effectiveness is unaffected by the choice of initial centroids.

A

False. The choice of initial centroids in K-means clustering can significantly affect the final clustering outcome due to the algorithm’s susceptibility to local optima.

51
Q

T/F: Hierarchical clustering algorithm’s results can vary depending on the linkage criteria used.

A

True. Different linkage methods, such as single, complete, or average linkage, use various approaches to measure the dissimilarity between sets of observations, leading to different cluster formations.

52
Q

T/F: In K-means clustering, observations are reassigned to the nearest centroid during each iteration.

A

True.

53
Q

T/F: K-means clustering generally struggles with categorical data.

A

True. K-means relies on calculating the mean for centroids during the clustering process, and calculating means is not meaningful or directly possible with categorical data.

54
Q

T/F: Only hierarchical clustering provides a visual representation of cluster formation over iterations through a dendrogram.

A

True. Only hierarchical clustering provides a visual representation of cluster formation over iterations through a dendrogram.

55
Q

T/F: If two different people are given the same data and perform one iteration of the algorithm (for k-means), their results at that point will be the same.

A

False. This is because K-means clustering begins with a random assignment of the data points into K clusters. This means the two people will get different results on the first iteration.

55
Q

T/F: The K-means clustering algorithm is less sensitive to the presence of outliers than the hierarchical clustering algorithm.

A

False. Both K-means clustering and hierarchical clustering force every observation into a cluster. This means that the clusters found may be heavily distorted by outliers that do not belong to any cluster.

56
Q

T/F: In practice, single linkage is generally preferred over average linkage.

A

False. Complete and average linkage are generally preferred because their dendrograms are more balanced.

57
Q

T/F: PCA finds a low-dimensional representation of a data set that contains as much as possible of the variation.

A

True

58
Q

T/F: PCA seeks a small number of dimensions that are as interesting as possible, where the concept of interesting is measured by the amount that the observations vary along each dimension.

A

True

59
Q

T/F: Each of the dimensions found by PCA is a linear combination of the p number of features.

A

True

60
Q

T/F: Choose K such that the total within-cluster variation is minimized.

A

False. We do not choose K the way we choose B in cross validation. We choose K first and then aim to minimise total within-cluster variation.

61
Q

T/F: The determination of K is subjective and there does not exist one method to determine the optimal number of clusters.

A

True

62
Q

T/F: Variables must always be standardized before hierarchical clustering is performed.

A

False.

63
Q

T/F: Clustering on only a subset of the data can produce extremely different results for KNN but not hierarchal clustering.

A

False.

63
Q
A

True

64
Q

T/F: Absolute correlation should not be used when performing hierarchical clustering on datasets with two features.

A

True. You can only use absolute correlation on datasets with 3 or more features.

65
Q

T/F: Euclidean distance focuses on the magnitude of observation profiles rather than their shape.

A

True. Euclidean distance focuses on magnitude of the observation while correlation abased distance focuses on the shape.

66
Q

T/F: Two observations are said to be similar if they have a large correlation-based distance.

A

False.

67
Q

T/F: Scaling variables have significant impact on PCA outcomes.

A

True

68
Q

T/F: k-means clustering is greedy.

A

True

69
Q

T/F: The PCA process involves linear transformations for dimensionality reduction.

A

True

70
Q

T/F: PCA can accurately approximate the original data when the number of principal components exceeds the number of original variables.

A

False. If the number of principal components equals the number of original variables, data approximation is exact.

71
Q

T/F: Cutting a dendrogram at a lower height will not decrease the number of clusters.

A

True. Cutting a dendrogram at a lower height will increase the number of clusters.

72
Q

T/F: For a given number of clusters, hierarchical clustering can sometimes yield less accurate results than K-means clustering.

A

True. The nesting restriction means that hierarchical clustering can yield less accurate cluster assignments.

73
Q

T/F: The goal is to use the fewest PCs necessary to gain a comprehensive understanding of the data.

A

True

74
Q

T/F: In supervised learning contexts, the number of PCs cannot be objectively optimized.

A

False. The use of PCs in supervised learning allows for the number of components to be objectively optimized through cross-validation.

75
Q

T/F: The scree plot offers a clear, objective criterion for selecting the exact number of PCs necessary.

A

False.