Unsupervised Learning Flashcards

Question 1

Q

What are the key differences between supervised and unsupervised learning?

Answer

A

Supervised learning uses labeled data to predict outputs.
Unsupervised learning finds patterns or clusters in unlabeled data.

Question 2

Q

Explain how K-means clustering works, including both main steps of the algorithm.

Answer

A

First you need to choose K initial cluster centers (means). Then repeat these steps
1. Assign Points: Assign data points to the nearest cluster center.
2. Update Centers: Recalculate cluster centers based on the mean of assigned points.

Question 3

Q

What are the major advantages and disadvantages of K-means clustering mentioned in the slides?

Answer

A

Advantages: Easy to implement, fairly fast, and adaptable to incorrect initial settings.
Disadvantages: Heavily dependent on initial seed, doesn’t use meta-information, and struggles with non-spherical clusters.

Question 4

Q

Describe how DBSCAN differs from K-means clustering and why it might perform better on certain types of data.

Answer

A

DBSCAN identifies clusters based on density, handles noise, and works well with non-spherical clusters, unlike K-means, which assumes spherical clusters.

Question 5

Q

Explain what a ‘latent representation’ is.

Answer

A

A latent representation is a compressed version of data in a lower-dimensional space, capturing its key features.

Question 6

Q

What types of data can be used with K-means clustering according to the slides?

Answer

A

K-means works with any type of data that can be defined as a numerical distance metric.

Question 7

Q

Why might choosing initial means that are far apart be beneficial in K-means clustering?

Answer

A

Far-apart initial means reduce the chance of poor convergence by starting with distinct clusters.

Question 8

Q

How does the ‘bottleneck’ in an autoencoder contribute to dimensionality reduction?

Answer

A

The bottleneck forces data to compress into fewer dimensions, keeping only the most important features.

Question 9

Q

What are the different metrics that can be minimized in K-means clustering besides distance to center?

Answer

A

Maximum distance to a centroid
Sum of average distance to the centroids over all clusters
Sum of variance over clusters
Total distance between all points and their centroids

Question 10

Q

How does DBSCAN handle noise in the data compared to K-means clustering?

Answer

A

DBSCAN marks low-density points as noise, while K-means forces all points into clusters.

Question 11

Q

What are the main limitations of K-means clustering when dealing with non-spherical data?

Answer

A

K-means assumes clusters are spherical and struggles with irregular shapes or varying densities.

Question 12

Q

Explain why running multiple iterations with different starting means is recommended for K-means clustering.

Answer

A

Different starting means help avoid poor convergence and improve the chance of finding optimal clusters.

Unsupervised Learning Flashcards

(12 cards)