E6 Flashcards

1
Q

What is clustering?

A

• Finding groups in data.
• Organizing data into groups such that there is:
(1) high similarity within each group,
(2) low similarity across the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is clustering the same as classification?

A

No.

  • Class labels can be found directly in the data. E.g., blood type.
  • Different goals: to “understand” the data better (explore), to organize the information we have.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Distance measures

A

• Euclidean distance
-> physical distance between two data points

• Manhattan distance
-> taxicab distance -> absolute difference

• Jaccard distance
-> treat two objects as sets of characteristics (text mining same word)

• Cosine distance
-> cosine of angle between two vectors (often text mining/recommend)

• Edit distance
-> Levenshtein metric -> autocorrect (spelling mistakes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

K-means clustering - how to?

A
  1. Select proximity measure and specify the number of clusters (k).
  2. Initiate the process by selecting centroids.
  3. Assign the data points to the “nearest” centroid to form a cluster.
  4. Calculate the new centroid.
  5. Iterate over steps 3 and 4 until the stopping criteria are fulfilled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Strengths k-means

A
  • Simple

- Efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Weaknesses k-means

A
  • the value of k – how to determine it?
  • converges to locally optimal solution
  • globular/spherical clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchical clustering

A

Creates a collection of ways to group the points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Output of hierarchical clustering

A

Dendograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Strengths hierarchical clustering

A
  • Clusters can be of any size and shape.

* Does not require to prespecify the number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Weaknesses hierarchical clustering

A
  • Still need to decide where to split.

* Computationally inefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly