lecture 11: K-means clustering Flashcards

Question 1

Q

recap: k-means clustering is a type of what learning

Answer

A

unsupervised learning, which uses unlabelled data

Question 2

Q

what is the goal of an unsupervised learning algorithm?

Answer

A

to take feature vector x as input and transform it into another vector or value that can be used to solve a practical problem

Question 3

Q

the absence of labels on the data means

Answer

A

the absence of a solid reference point to judge the quality of the model

Question 4

Q

what are the main approaches for unsupervised learning

Answer

A

clustering, density estimation, component analysis, neural networks

Question 5

Q

what does the k in k-means clustering represent

Answer

A

it is the number of clusters we want to identify within the data, and also the number of distinct data points that will be selected randomly initially, called centroids

Question 6

Q

what is the next step after randomly selecting the k initial data points or centroids?

Answer

A

for each other point in the data set, measure the distance(or some other metric) between that point and each of the centroids. we assign each data point or example to the cluster with the closest centroid

Question 7

Q

what is the next iteration after the initial assignment to the clusters

Answer

A

for each centroid, we calculate the average feature vector of all the examples assigned to its cluster, then that average vector becomes the new centroid.
we recompute the distances for each example to the new centroids and modify the assignments if necessary

Question 8

Q

how do we decide that the clusters are final

Answer

A

when the assignments no longer change after the centroids have been recomputed

Question 9

Q

how do we decide the value of k in the first place

Answer

A

use an elbow plot, find the most dramatic change in variance when k is increased

Question 10

Q

what are the 2 methods for the initialisation step of a k-means clustering algorithm?

Answer

A

random partition - randomly assign each point to a cluster and then compute the initial means to be the first centroids
forgy method - randomly choose k points from the data set to be the initial centroids

Question 11

Q

what is the difference between hard and soft clustering

Answer

A

hard - each data point can only belong to one cluster

soft(fuzzy) - each data point can belong to more than one cluster

lecture 11: K-means clustering Flashcards

(11 cards)