Clustering Flashcards

Question 1

Q

What is clustering?

Answer

A

Clustering is grouping data into categories based on similarities. - grouping similar objects

Question 2

Q

Which types of clustering have we worked with?

Answer

A

K-means clustering and K-modes

Question 3

Q

What does unsupervised clustering remind us of?

Answer

A

classification in supervised learning

Question 4

Q

What is K-means clustering?

Answer

A

K-means clustering is clustering that works on parametric data (interval and ratio). (notice the means.)

Question 5

Q

How does K-means clustering work?

Answer

A

Initially, you split the data into K clusters. Then you initialize some random cluster centroids and calculate the euclidian-distance (distance) to each cluster centroid, this iterates until convergence.

note: means - we work with means

Question 6

Q

What is K-modes clustering?

Answer

A

it is like the K-means but it works on non-parametric data.

Question 7

Q

Where do we use K-modes clustering?

Answer

A

We use it in data mining, when we, for example, want to cluster non-parametric data as gender.

Question 8

Q

With K-mean clustering we might not have labels. Can we use K-mean clustering to create labels?

Answer

A

Yes, it is kind of good sometimes but not precise, as that would take supervised overview to confirm.

Question 9

Q

What is the difference between how K-means and K-modes work?

Answer

A

K-means deal with the means and the k-modes deal with modes.

Modes are looking at features and how similar they are to each other. If the features are the same the difference is set to 0 and if they are different they are set to 1. And then we measure the distance.

Question 10

Q

Can you use clustering on surveys?

Answer

A

Yes if you have a lot of responses in the Likert-scale you can cluster the answers and do PCA to visualize it.

Question 11

Q

What is the danger of using clustering on surveys

Answer

A

You might not notice if an outlier cluster is causation of bad data

Question 12

Q

How can you find the number of clusters you should find?

Answer

A

The elbow method.

Question 13

Q

What is paradoxical with the clustering problem?

Answer

A

The optimal solution is to have x clusters for x number of responses as the clusters would describe the data exactly 1:1

Question 14

Q

What is the elbow method

Answer

A

The elbow method finds the point on the graph where the number of clusters relative to the sum of errors is getting less steep.

In short where one extra cluster starts minimising the error less

Clustering Flashcards

(14 cards)