lecture 11: K-means clustering Flashcards

1
Q

recap: k-means clustering is a type of what learning

A

unsupervised learning, which uses unlabelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the goal of an unsupervised learning algorithm?

A

to take feature vector x as input and transform it into another vector or value that can be used to solve a practical problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the absence of labels on the data means

A

the absence of a solid reference point to judge the quality of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the main approaches for unsupervised learning

A

clustering, density estimation, component analysis, neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the k in k-means clustering represent

A

it is the number of clusters we want to identify within the data, and also the number of distinct data points that will be selected randomly initially, called centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the next step after randomly selecting the k initial data points or centroids?

A

for each other point in the data set, measure the distance(or some other metric) between that point and each of the centroids. we assign each data point or example to the cluster with the closest centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the next iteration after the initial assignment to the clusters

A

for each centroid, we calculate the average feature vector of all the examples assigned to its cluster, then that average vector becomes the new centroid.
we recompute the distances for each example to the new centroids and modify the assignments if necessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do we decide that the clusters are final

A

when the assignments no longer change after the centroids have been recomputed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we decide the value of k in the first place

A

use an elbow plot, find the most dramatic change in variance when k is increased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the 2 methods for the initialisation step of a k-means clustering algorithm?

A

random partition - randomly assign each point to a cluster and then compute the initial means to be the first centroids
forgy method - randomly choose k points from the data set to be the initial centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the difference between hard and soft clustering

A

hard - each data point can only belong to one cluster

soft(fuzzy) - each data point can belong to more than one cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly