More Advanced Methods; Cluster Analysis Flashcards

Question 1

Q

Cluster Analysis is an __________ data analysis tool for organizing observed data into meaningful clusters, based on combinations of variables.

Answer

A

exploratory

Question 2

Q

When to look at grouping (cluster) patterns?

Answer

A

A PT practitioner would like to group patients according to their attributes in order to better treat them with personalized care plan.
A PT practitioner would like to classify patients based on their individual health records in order to develop specific management strategies that are appropriate to the patients.

Question 3

Q

What are the 2 types of clusters?

Answer

A

) Hierarchical Clustering

2. ) Non-hierarchical Clustering

Question 4

Q

Hierarchical Clustering:

A set of nested clusters organized using a hierarchical _____.
The hierarchical methods produce a set of nested clusters in which each pair of individuals or clusters is progressively nested in a larger cluster until only ____ cluster remains, or all individuals in one group are partitioned step by step.

Answer

A

tree

- one

Question 5

Q

Non-hierarchical Clustering:

A group of individuals into non-overlapping subsets (clusters) such that each object is in exactly __ cluster.
The non-hierarchical methods divide a dataset of n individuals into m clusters.
________ clustering is the most commonly used non-hierarchical technique.

Answer

A

one

- K-mean

Question 6

Q

What are the 3 types of clustering techniques?

Answer

A

Hierarchical Clustering
K-Mean Clustering
Two-Step Clustering

Question 7

Q

__________ clustering is a clustering algorithm where

the clustering is mapped into a hierarchy basing its grouping on the inter-cluster similarities or dissimilarities.

Answer

A

Hierarchical

Question 8

Q

What are the two “types” of Hierarchical Clustering?

Answer

A

Bottom-Up (agglomerative)

- Top-Down (divisive)

Question 9

Q

What is bottom-up clustering?

Answer

A

Starts with 1 single piece of datum and then merge it with others to form larger groups.

Question 10

Q

What is top-down clustering?

Answer

A

Starts with all in one group and then partition the data step by step using a flat clustering algorithm.

Question 11

Q

What are the steps for bottom-up (agglomerative) clustering?

Answer

A

) Assign each item to a cluster.
) Find the closest (most similar) pair of clusters and merge them into a single cluster, so there is now one cluster less.
) Compute distances (similarities) between the new cluster and each of the old clusters.
) Repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size.

Question 12

Q

What are the limitations of Hierarchical Clustering?

Answer

A

Arbitrary decisions- necessary to specify both the distance metric and the linkage criteria without any strong theoretical basis for such decisions.
Data types- works well with continuous data.
Misinterpretation of dendrogram- selecting the number of clusters using dendrogram may mislead.

Question 13

Q

_______ clustering is a clustering algorithm where data is classified into K number of clusters. This is the most widely used clustering method. Each individual data is mapped into the cluster with its nearest mean.

Question 14

Q

What are the steps for K-Mean Clustering?

Answer

A

) Select K points as the initial centroids.
) Assign points to different centroids based on proximity.
) Re-evaluate the centroid of each group.
) Repeat steps 2 and 3 until the best solutions emerges (the centers are stable).

Question 15

Q

What are the limitations of K-Mean Clustering?

Answer

A

The researcher chooses the number of clusters.
More Ks (number of clusters), shorter distance from the centroid.
As an extreme scenario: When every data point is a centroid, the distance is zero.
What is the optimal K?

Question 16

Q

__________ clustering is a hybri approach where we run pre-clustering first and then run hierarchical methods.

Answer

Study These Flashcards

A

Two-Step

Question 17

Q

What are the features that differentiate Two-Step Clustering from traditional clustering techniques?

Answer

Study These Flashcards

A

The ability to create clusters based on both categorical and continuous variables.
Automatic selection of the number of clusters.
The ability to analyze large data set efficiently.

Question 18

Q

What are the steps for Two-Step Clustering?

Answer

Study These Flashcards

A

) A sequential approach is used to pre-cluster the cases by condensing the variables (pre-clustering).
) The pre-clusters are statistically merged into the desired number of clusters (clustering).

Question 19

Q

What are the limitations of Two-Step Clustering?

Answer

Study These Flashcards

A

It can take both continuous and categorical data.
There is no need to enter the number of clusters a priori because it uses indexes of fit (AIC or BIC) to compare each cluster solution to determine which number of cluster is best.

Question 20

Q

Cluster Quality Validation Index:

______________ measures how well an individual data is clustered and it estimates the average distance between clusters.
______________ displays a measure of how close each point in one cluster is to points in the neighboring cluster.
Data with a large silhouette coefficient value of almost 1 means what?
Data with a small silhouette coefficient value of almost 0 means what?
Data with a negative silhouette coefficient means what?

Answer

Study These Flashcards

A

silhouette coefficient
silhouette plot
very well clustered
lies between 2 clusters
probably placed in wrong cluster

Question 21

Q

Silhouette Values:

0.5-1.0 = _____
0.2-0.5 = _____
-1.0-0.2 = ______

Answer

Study These Flashcards

A

Good
Fair
Poor

Question 22

Q

Summary for More Advanced Methods; Cluster Analysis:

The application of ___________ involves grouping similar cases into homogenous groups (called clusters) when the grouping is not previously known.
With ________ clustering, the clustering is mapped into a hierarchy basing its grouping on the inter-cluster similarities or dissimilarities.
With _______ clustering, data is classified into K number of clusters mapping each individual data into the cluster with its nearest mean.
With ________ clustering, a sequential approach is first used to pre-cluster the cases, and second the pre-clusters are statistically merged into the desired number of clusters.
Two step clustering may be a better choice over hierarchical or k-mean because the two step clustering can work with _________ data and it is not bound to an arbitrary choice of the number of clusters.

Answer

Study These Flashcards

A

cluster analysis
hierarchical
K-mean
Two-step
categorical

More Advanced Methods; Cluster Analysis Flashcards

(22 cards)