Hierarchical clustering Flashcards

1
Q

Hierarchical clustering algorithm operates in _______ fashion and why

A

Hierarchical clustering algorithms typically operate in a greedy fashion, making locally optimal choices at each step (merging the closest clusters or spitting the largest clusters) without reconsidering previous steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hierarchal clustering is __________-

A

divide and conquer clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Another name of agglomerative clustering

A

Bottom up approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Another name of agglomerative clusetring

A

Top down approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hierarchical clustering can be used for what or cant’s be used for what

A

Hierarchical clustering can be used for outlier detection but not for finding missing values (NA) or detected fake values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hierarchical clustering is primarily used for ______ because ________

A

Hierarchical clustering is primarily used for exploration because it helps in understanding the natural grouping within data which can be very useful in exploratory data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchical clustering is _________ visualization

A

Dendogram visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In hierarchical clustering do we need to specify the number of clusters?

A

No need to specify the number of clusters in hierarchical clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How hierarchical clustering provides flexibility or not

A

It allows you to choose the number of clusters by cutting the dendrogram at different levels, providing flexibility to explore the data at different granularities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hierarchal clustering is deterministic or not

A

Hierarchal clustering is deterministic because it allows a fixed sequence of merging or splitting clusters based on defined criteria like distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linkage (definition and types)

A

Linkage is how to link the clusters
Linkage techniques are two types: Single linkage and complete linkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Single linkage
* Another name
* Keyword
* Definition
* Formula

A
  • Another name: Nearest neighbour method
  • Keyword: shortest distance
  • Definition: This linkage technique focused on the shortest distance between data points in each cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Complete linkage
* Another name
* Keyword
* Definition
* Formula

A
  • Another name: Farthest neighbour method
  • Keyword: longest distance
  • Definition: This linkage technique focused on the longest distance between data points in each cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Agglomerative clustering keyword

A

Merging approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Agglomerative clustering use which linkage

A

can use any linkage
Single linkage or complete linkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Decisive clustering use which linkage

A

Decisive linkage use only complete linkage

17
Q

Remember point in agglomerative clustering problem

A
18
Q

Average linkage technique

A
19
Q

Decisive clustering keyword

A

Splitting approach

20
Q

How to do problem of decisive clustering

A

We create Minimal spanning tree (MST) based on dissimilar matrix

21
Q

Minimal spanning tree characteristics (4)

A

It is a connected tree
No loops/ no closed circuits in the tree.
Each data point(node) in the tree is visited atleast once.
If ‘n’ nodes are present in the tree, then (n-1) edges are present or formed in the tree.

22
Q

If there is n nodes in MST then

A

If ‘n’ nodes are present in the tree, then (n-1) edges are present or formed in the tree.

23
Q

Remember point in decisive clustering problem

A
24
Q

Explain number of levels in hierarchy in both agglomerative clustering and decisive clustering

A

Agglomerative clustering: If there are n observations, then there will be n-1 levels in the hierarchy.Since n−1 merges are required to combine n observations into a single cluster, the hierarchy has n−1 levels.

Decisive clustering: The number of levels depends on the way splits occur (e.g., binary splits may create more or fewer than n−1 levels).

25
Q

Ward’s method

A

Merging cluster way.
In this technique, we minimise the increase in variance when merging clusters. Repeat this process iteratively until all data points are in a single cluster or until reach the desired number of clusters.

Similarity of two clusters is based on the increase in squared error when two clusters are merged.
(Similar to group average if distance between points is distance squared).
Less susceptible to noise and outliers.
Biased towards global clusters.
Hierarchical analogue of k-means
(can be used to initialise k-means)

26
Q

In Divisive clustering, explain split clusters in terms of variance

A
27
Q

Who is more sensitive to outliers (single linkage/ multiple linkage)

A

Single linkage is more sensitive to outliers

28
Q

Single linkage characteristics

A
29
Q

Another name of multiple linkage

A

Complete linkage or maximum linkage

30
Q

Multiple linkage characteristics

A
31
Q

Which one is more expensive (Divisive/ agglomerative)

A

Divisive clustering is computationally more expensive than agglomerative clustering because it requires considering all possible splits at each.

32
Q

Which one uses more (Divisive / agglomerative)

A

Agglomerative is more commonly used whereas divisive is less commonly used due to its complexity.

33
Q

If we solve question of divisive clustering using k means, what should be the value of k

A

k=2

34
Q

Which is more efficient (k means/ hierarchal clustering)

A

k means