ML-07 - Unsupervised learning Flashcards

1
Q

ML-07 - Unsupervised learning

What’s the requirement for unsupervised learning?

A

Your data needs some inherent structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ML-07 - Unsupervised learning

What are the two major types of unsupervised learning?

A
  • Clustering
  • Anomaly detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ML-07 - Unsupervised learning

Name the techniques mentioned in the lecture slides. (5)

A
  • K-Means Clustering
  • Mean-Shift Clustering
  • DBSCAN
  • EM using GMM
  • Agglomerative Hierarchical Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ML-07 - Unsupervised learning

What is DBSCAN short for?

A

Density-Based Spatial Clustering Applications with Noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ML-07 - Unsupervised learning

What is EM short for?

A

Expectation maximixation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ML-07 - Unsupervised learning

What is GMM short for?

A

Gaussian Mixture Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ML-07 - Unsupervised learning

What is “EM using GMM” short for?

A

xpectation Maximization Clustering using Gaussian Mixture Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ML-07 - Unsupervised learning

How do you perform k-means clustering?

A

1) Assign N centroids to your data at random.
2) Assign data point to nearest centroid.
3) Move centroid to average of data positions.
4) Repeat 2+3 until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ML-07 - Unsupervised learning

How do you select the number of clusters to use in k-means?

A
  • Eyeball data (if few dimensions)
  • Elbow method
  • Analyze within-cluster variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ML-07 - Unsupervised learning

Describe the elbow method.

A

1) Plot cost vs. n_clusters
2) Look for the “elbow”, i.e. where the cost decrease slows down.

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ML-07 - Unsupervised learning

When does the elbow method work the best?

A

When you have separated clusters.

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ML-07 - Unsupervised learning

When does the elbow method not work that well?

A

If you have non-separated clusters with uncertain boundaries. There might not be an elbow in the plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ML-07 - Unsupervised learning

What are the pros of using k-means?

A

Fast computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ML-07 - Unsupervised learning

What are the drawbacks of k-means?

A
  • Challenging to identify the groups/clusters.
  • Random initialization means the results can lack consistency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ML-07 - Unsupervised learning

What two classes of methods are used to evaluate clustering performance?

A
  • Supervised
  • Unsupervised
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ML-07 - Unsupervised learning

What is supervised clustering evaluation?

A

Compare predicted vs. actual labels.

17
Q

ML-07 - Unsupervised learning

What is the silhouette coefficient?

A

An unsupervised clustering evaluation technique.

18
Q

ML-07 - Unsupervised learning

Is the silhouette coefficient a supervised or unsupervised method?

A

Unsupervised.

19
Q

ML-07 - Unsupervised learning

How does the silhouette coefficient work?

A

It scores the clustering model’s performance by checking that:
- Inter-cluster distances should be maximized.
- Inter-cluster distances should be minimized

20
Q

ML-07 - Unsupervised learning

What is the formula for the silhouette coefficient?

A

(See image)

The mean value of all samples is then used.

21
Q

ML-07 - Unsupervised learning

How do you evaluate the results of the silhouette coefficient?

A
  • -1 is bad
  • 1 is good.
  • 0 indicates overlapping clusters.