Clustering Flashcards

Question 1

Q

What topics are under dissimilarities?

Answer

A

Continuous dissims.
Binary dissims.
3 Rules
Dissim matricies

Question 2

Q

What are two types of continuous dissims?

Answer

A

Euclidean

Manhattan

Question 3

Q

What are two types of binary dissims?

Answer

A

Simple

Jaccard

Question 4

Q

What is the Simple binary dissimilarity equation?

Answer

A

d(x,y)=(a+d)/(a+b+c+d)

Question 5

Q

What is the Jaccard binary dissimilarity equation?

Answer

A

d(x,y)=a/(a+b+c)

Question 6

Q

What are the three rules for dissimilarities?

Answer

A

d(x,y)>=0, if x=y then d=0
d(x,y)=d(y,x)
d(x,y)>=d(x,z)+d(x,y)

Question 7

Q

What are the key features of dissimilarity matricies?

Answer

A

0’s along the diagonal
Usually factors are standardized
Usually factors are scaled
Symmetric

Question 8

Q

What topics are important to hierarchical clustering?

Answer

A

5 steps
Linkage
Chaining
Number of groups

Question 9

Q

What are the five steps in hierarchical clustering?

Answer

A

Each obs. into its own group
Pair nearest obs.
One less group now!
Pair next nearest obs.
Repeat until no individual obs left

Question 10

Q

What types of linkage are there?

Answer

A

Simple
Complete
Average

Question 11

Q

What is the formula for simple linkage?

Answer

A

d(A,B)=min x€A,y€B,d(x,y)

Question 12

Q

What is the formula for complete linkage?

Answer

A

d(A,B)=max x€A,y€B,d(x,y)

Question 13

Q

What is important to know about the number of groups in hierarchical clustering?

Answer

A

The dendrogram should be cut along the y-axis, this divides the total obs. into groups below the line.

Question 14

Q

What is chaining?

Answer

A

Where observations are individually appended to the same group one after another -results in poor model.

Question 15

Q

What topics are under partitioning methods?

Answer

A

6 steps
K-means
Starting position
Number of groups

Question 16

Q

What are the 6 steps to partitioning methods?

Answer

Study These Flashcards

A

Data is partitioned into groups
Centroids of groups are calculated
Each obs. distance from centroid is calculated
Move obs to new group if necessary
If no moves are made, stop, local min found
Repeat from 2 otherwise

Question 17

Q

What is important to knows about k means?

Answer

Study These Flashcards

A

It is iterative
It is computationally efficient (+)
It results in local minimums (-)

Question 18

Q

What is important to remember about starting positions?

Answer

Study These Flashcards

A

Random point
Random area
Or result from hierarchical cluster

Question 19

Q

How is the number of clusters in the data determined?

Answer

Study These Flashcards

A

Trialling range of clusters (e.g., 1-10)
Plotting sum of squares
Looking for elbow in graph

Question 20

Q

What three topics are under cluster validation?

Answer

Study These Flashcards

A

5 steps
Rand index
Issues with clustering

Question 21

Q

What are the issues with clustering?

Answer

Study These Flashcards

A

Sensitive to the start point
Only works for continuous data
Clusters are all spherical

Question 22

Q

What things are important to note about the rand index?

Answer

Study These Flashcards

A

How to calculate: rand(S1,S2)=A/A+D

Adjusted rand accounts for natural randomness

Question 23

Q

What are the 5 steps in cluster validation?

Answer

Study These Flashcards

A

Split data into train/test
Run clustering on training data
Group test data based on cluster: S1
Cluster test individually: S2
Cross tabulate S1 and S2

Clustering Flashcards

(23 cards)