Lecture 8 - Unsupervised Learning I Flashcards

Question 1

Q

What do patterns tell?

Answer

A

Patterns describe or summarize the data set or parts of it

Question 2

Q

What is cluster analysis?

Answer

A

Identifying groups of “similar” data objects?

Question 3

Q

What are association rules?

Answer

A

Finding associations between attributes or typical combinations, like demand = high, supply = low then price = high

Question 4

Q

What is deviation analysis?

Answer

A

Finding groups that deviate from the rest of the data, like men under 30 differ from the whole dataset.

Question 5

Q

What is hierarchical clustering?

Answer

A

Hierarchical clustering builds clusters step by step

Question 6

Q

What is agglomerative hierarchical clustering?

Answer

A

A bottom up strategy by first considering each data object as a separate cluster and then step by step joining clusters together

Question 7

Q

What is divisive hierarchical clustering?

Answer

A

Starting with the whole data set as a one cluster and then dividing it to smaller ones. Seldom used, because first step has 2^(n-1) steps

Question 8

Q

What is isotrophic distance?

Answer

A

Distance grows equally fast in all directions (like euclidean)

Question 9

Q

What is nonisotrophic distance?

Answer

A

Distances have different weightings for different directions

Question 10

Q

What are the b, n and x in the dissimilarity measures?

Answer

A

b = hold in both records, n = do not hold in both records, j = hold in only one of both records

Question 11

Q

What is single linkage?

Answer

A

Dissimilarity between the two most similar data objects (so two closest ones connected)

Question 12

Q

What is a complete linkage?

Answer

A

Dissimilarity between two most dissimilar data objects (so two that are furthest away connect)

Question 13

Q

What is average linkage?

Answer

A

Average dissimilarity between two points of two clusters

Question 14

Q

What is centroid linkage?

Answer

A

Distance between two centroids (mean value vectors)

Question 15

Q

What are dendrograms?

Answer

A

The cluster merging process arranges data points in a binary tree, when drawing the data tuples at the bottom, draw a connection between clusters that are merged with the distance to the data points, then cut the dendogram at specific point to get the clusters for that distance

Question 16

Q

What are some approaches to choose the clusters?

Answer

Study These Flashcards

A

Simplest approach: Specify a minimum desired distance between clusters, stop merging when farther apart

Visual approach: Merge clusters until all data points are combined into one cluster, draw the dendrofram and find a good cut level (doesn’t have to be horizontal)

More sophisticated approaches: analyze the sequences, find a step where the step is a lot larger than previous step

Question 17

Q

What is k-Means clustering?

Answer

Study These Flashcards

A

k-means algorithm partitions data points into exactly k clusters, the k must be chosen in advance

The objective is to minimize the total intra-cluster variance

Question 18

Q

How does k-means work?

Answer

Study These Flashcards

A

Initialize the cluster centers randomly by selecting k data points, assign each data point to the cluster closest to it, update the centre, repeat until converges

Question 19

Q

What’s the problem with k-means?

Answer

Study These Flashcards

A

The results is fairly sensitive to the initial positions of cluster centers, so bad initialisation may lead to failing

Question 20

Q

What is silhouette coefficient?

Answer

Study These Flashcards

A

Silhouette value is a measure of how similar an object is to its own cluster compared to other clusters, ranges from -1 to +1, high value indicates that the value is well matched to its own cluster.

Question 21

Q

What is density-based clustering? DBSCan?

Answer

Study These Flashcards

A

Using numerical data it is possible to use density-based clustering like DBScan

Find a data point where the density is high, aka in distance x there are at least y other points
All the points in the x distance from the neighbourhood are considered to belong to one cluster
Expand the cluster until there is not at least y points in the distance x

Lecture 8 - Unsupervised Learning I Flashcards

(21 cards)