Clustering Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is clustering?

A

Clustering is grouping data into categories based on similarities. - grouping similar objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which types of clustering have we worked with?

A

K-means clustering and K-modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does unsupervised clustering remind us of?

A

classification in supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is K-means clustering?

A

K-means clustering is clustering that works on parametric data (interval and ratio). (notice the means.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does K-means clustering work?

A

Initially, you split the data into K clusters. Then you initialize some random cluster centroids and calculate the euclidian-distance (distance) to each cluster centroid, this iterates until convergence.

note: means - we work with means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-modes clustering?

A

it is like the K-means but it works on non-parametric data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Where do we use K-modes clustering?

A

We use it in data mining, when we, for example, want to cluster non-parametric data as gender.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

With K-mean clustering we might not have labels. Can we use K-mean clustering to create labels?

A

Yes, it is kind of good sometimes but not precise, as that would take supervised overview to confirm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between how K-means and K-modes work?

A

K-means deal with the means and the k-modes deal with modes.

Modes are looking at features and how similar they are to each other. If the features are the same the difference is set to 0 and if they are different they are set to 1. And then we measure the distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you use clustering on surveys?

A

Yes if you have a lot of responses in the Likert-scale you can cluster the answers and do PCA to visualize it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the danger of using clustering on surveys

A

You might not notice if an outlier cluster is causation of bad data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you find the number of clusters you should find?

A

The elbow method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is paradoxical with the clustering problem?

A

The optimal solution is to have x clusters for x number of responses as the clusters would describe the data exactly 1:1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the elbow method

A

The elbow method finds the point on the graph where the number of clusters relative to the sum of errors is getting less steep.

In short where one extra cluster starts minimising the error less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly