Unsupervised Learning Flashcards

1
Q

What are the key differences between supervised and unsupervised learning?

A

Supervised learning uses labeled data to predict outputs.
Unsupervised learning finds patterns or clusters in unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain how K-means clustering works, including both main steps of the algorithm.

A

First you need to choose K initial cluster centers (means). Then repeat these steps
1. Assign Points: Assign data points to the nearest cluster center.
2. Update Centers: Recalculate cluster centers based on the mean of assigned points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the major advantages and disadvantages of K-means clustering mentioned in the slides?

A

Advantages: Easy to implement, fairly fast, and adaptable to incorrect initial settings.
Disadvantages: Heavily dependent on initial seed, doesn’t use meta-information, and struggles with non-spherical clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe how DBSCAN differs from K-means clustering and why it might perform better on certain types of data.

A

DBSCAN identifies clusters based on density, handles noise, and works well with non-spherical clusters, unlike K-means, which assumes spherical clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what a ‘latent representation’ is.

A

A latent representation is a compressed version of data in a lower-dimensional space, capturing its key features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What types of data can be used with K-means clustering according to the slides?

A

K-means works with any type of data that can be defined as a numerical distance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why might choosing initial means that are far apart be beneficial in K-means clustering?

A

Far-apart initial means reduce the chance of poor convergence by starting with distinct clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the ‘bottleneck’ in an autoencoder contribute to dimensionality reduction?

A

The bottleneck forces data to compress into fewer dimensions, keeping only the most important features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the different metrics that can be minimized in K-means clustering besides distance to center?

A

Maximum distance to a centroid
Sum of average distance to the centroids over all clusters
Sum of variance over clusters
Total distance between all points and their centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does DBSCAN handle noise in the data compared to K-means clustering?

A

DBSCAN marks low-density points as noise, while K-means forces all points into clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the main limitations of K-means clustering when dealing with non-spherical data?

A

K-means assumes clusters are spherical and struggles with irregular shapes or varying densities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain why running multiple iterations with different starting means is recommended for K-means clustering.

A

Different starting means help avoid poor convergence and improve the chance of finding optimal clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly