Cluster Analysis Flashcards

Question 1

Q

When to look at cluster patterns

Answer

A

PT would like to group patients according to their attributes in order to better treat them

PT would like to classify patients based on their individual health records in order to develop specific appropriate management strategies

Question 2

Q

Hierarchical clustering

Answer

A

set of nested clusters organized using hierarchical tree

produce a set of nested clusters. each pair of individuals or clusters progressively nested in larger until only one remains

Question 3

Q

Non-Hierarchical clustering

Answer

A

group of individuals into clusters so that each object is in exactly one cluster

divides a data set of ‘n’ individuals into ‘m’ clusters

K-mean clustering most commonly used type

Question 4

Q

Hierarchical Clustering:

Bottom-up (agglomerative)

Answer

A

starts with one single piece of data and then merge it with others to form larger groups

Question 5

Q

Hierarchical Clustering:
Top down (divisive)

Answer

A

starts with all in one group and then partition data step by step using a flat clustering algorithm

Question 6

Q

Procedure of Agglomerative style

Answer

A

assign each item to a cluster
find closest pair of clusters and merge into a single cluster
compute distances (similarities) between the new cluster and each of the old clusters
repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size

Question 7

Q

Limitations of Hierarchical Clustering

Answer

A

necessary to specifiy both distance metric and linkage criteria without any strong theoretical basis

selecting the number of clusters using dendrogram may mislead

Question 8

Q

K-Mean Clustering

Answer

A

data is classified into K number of clusters.

each individual data is mapped into the cluster with its nearest mean

Question 9

Q

K-Mean Clustering:

Procedure

Answer

A

select K points as initial centroids
assign points to different centroids based on proximity
re-evaluate centroid of each group
repeat steps 2 and 3 until best solutions emerge (centers are stable)

Question 10

Q

K-Mean Clustering:

Limitations

Answer

A

researcher chooses number of clusters

more Ks=shorter distance from centroid

when every data point is a centroid the distance is 0 but is useless

Question 11

Q

Two Step Clustering

Answer

A

run pre-clustering first and then hierarchical methods.

can have categorical AND continuous clusters
automatic selection of number of clusters
ability to analyze large data set efficiently

Question 12

Q

Two Step Clustering:

Procedure

Answer

A

a sequential approach is used to pre-cluster the cases by condensing the variables
the pre-clusters are statistically merged into the desired # of clusters

Question 13

Q

Cluster Quality Validation Index:

Silhouette coefficient

Answer

A

measures how well an individual data is clustered and estimates the average distance between clusters

Question 14

Q

Cluster Quality Validation Index:

Silhouette plot

Answer

A

displays a measure of how close each point in one cluster is to points in the neighboring cluster

Question 15

Q

Interpretation with Silhouette coefficient:

individual data with large Silhouette coefficient value of almost 1

Answer

A

very well clustered

Question 16

Q

Interpretation with Silhouette coefficient:

individual data with small Silhouette coefficient value of around 0

Answer

Study These Flashcards

A

lies between two clusters

Question 17

Q

Interpretation with Silhouette coefficient:

individual data with negative coefficient value

Answer

Study These Flashcards

A

probably placed in the wrong cluster

Question 18

Q

Silhouette coefficient value

0.5-1.0

Answer

Study These Flashcards

A

Good

Question 19

Q

Silhouette coefficient value

0.2-0.5

Answer

Study These Flashcards

A

Fair

Question 20

Q

Silhouette coefficient value

-1.0 - 0.2

Answer

Study These Flashcards

A

Poor

Cluster Analysis Flashcards

(20 cards)