More Advanced Methods; Cluster Analysis Flashcards

1
Q

Cluster Analysis is an __________ data analysis tool for organizing observed data into meaningful clusters, based on combinations of variables.

A

exploratory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When to look at grouping (cluster) patterns?

A
  • A PT practitioner would like to group patients according to their attributes in order to better treat them with personalized care plan.
  • A PT practitioner would like to classify patients based on their individual health records in order to develop specific management strategies that are appropriate to the patients.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 types of clusters?

A
  1. ) Hierarchical Clustering

2. ) Non-hierarchical Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hierarchical Clustering:

  • A set of nested clusters organized using a hierarchical _____.
  • The hierarchical methods produce a set of nested clusters in which each pair of individuals or clusters is progressively nested in a larger cluster until only ____ cluster remains, or all individuals in one group are partitioned step by step.
A
  • tree

- one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-hierarchical Clustering:

  • A group of individuals into non-overlapping subsets (clusters) such that each object is in exactly __ cluster.
  • The non-hierarchical methods divide a dataset of n individuals into m clusters.
  • ________ clustering is the most commonly used non-hierarchical technique.
A
  • one

- K-mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 3 types of clustering techniques?

A
  • Hierarchical Clustering
  • K-Mean Clustering
  • Two-Step Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

__________ clustering is a clustering algorithm where

the clustering is mapped into a hierarchy basing its grouping on the inter-cluster similarities or dissimilarities.

A

Hierarchical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two “types” of Hierarchical Clustering?

A
  • Bottom-Up (agglomerative)

- Top-Down (divisive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is bottom-up clustering?

A

Starts with 1 single piece of datum and then merge it with others to form larger groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is top-down clustering?

A

Starts with all in one group and then partition the data step by step using a flat clustering algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps for bottom-up (agglomerative) clustering?

A
  1. ) Assign each item to a cluster.
  2. ) Find the closest (most similar) pair of clusters and merge them into a single cluster, so there is now one cluster less.
  3. ) Compute distances (similarities) between the new cluster and each of the old clusters.
  4. ) Repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the limitations of Hierarchical Clustering?

A
  • Arbitrary decisions- necessary to specify both the distance metric and the linkage criteria without any strong theoretical basis for such decisions.
  • Data types- works well with continuous data.
  • Misinterpretation of dendrogram- selecting the number of clusters using dendrogram may mislead.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

_______ clustering is a clustering algorithm where data is classified into K number of clusters. This is the most widely used clustering method. Each individual data is mapped into the cluster with its nearest mean.

A

K-mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps for K-Mean Clustering?

A
  1. ) Select K points as the initial centroids.
  2. ) Assign points to different centroids based on proximity.
  3. ) Re-evaluate the centroid of each group.
  4. ) Repeat steps 2 and 3 until the best solutions emerges (the centers are stable).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the limitations of K-Mean Clustering?

A
  • The researcher chooses the number of clusters.
  • More Ks (number of clusters), shorter distance from the centroid.
  • As an extreme scenario: When every data point is a centroid, the distance is zero.
  • What is the optimal K?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

__________ clustering is a hybri approach where we run pre-clustering first and then run hierarchical methods.

A

Two-Step

17
Q

What are the features that differentiate Two-Step Clustering from traditional clustering techniques?

A
  • The ability to create clusters based on both categorical and continuous variables.
  • Automatic selection of the number of clusters.
  • The ability to analyze large data set efficiently.
18
Q

What are the steps for Two-Step Clustering?

A
  1. ) A sequential approach is used to pre-cluster the cases by condensing the variables (pre-clustering).
  2. ) The pre-clusters are statistically merged into the desired number of clusters (clustering).
19
Q

What are the limitations of Two-Step Clustering?

A
  • It can take both continuous and categorical data.
  • There is no need to enter the number of clusters a priori because it uses indexes of fit (AIC or BIC) to compare each cluster solution to determine which number of cluster is best.
20
Q

Cluster Quality Validation Index:

  • ______________ measures how well an individual data is clustered and it estimates the average distance between clusters.
  • ______________ displays a measure of how close each point in one cluster is to points in the neighboring cluster.
  • Data with a large silhouette coefficient value of almost 1 means what?
  • Data with a small silhouette coefficient value of almost 0 means what?
  • Data with a negative silhouette coefficient means what?
A
  • silhouette coefficient
  • silhouette plot
  • very well clustered
  • lies between 2 clusters
  • probably placed in wrong cluster
21
Q

Silhouette Values:

  • 0.5-1.0 = _____
  • 0.2-0.5 = _____
  • -1.0-0.2 = ______
A
  • Good
  • Fair
  • Poor
22
Q

Summary for More Advanced Methods; Cluster Analysis:

  • The application of ___________ involves grouping similar cases into homogenous groups (called clusters) when the grouping is not previously known.
  • With ________ clustering, the clustering is mapped into a hierarchy basing its grouping on the inter-cluster similarities or dissimilarities.
  • With _______ clustering, data is classified into K number of clusters mapping each individual data into the cluster with its nearest mean.
  • With ________ clustering, a sequential approach is first used to pre-cluster the cases, and second the pre-clusters are statistically merged into the desired number of clusters.
  • Two step clustering may be a better choice over hierarchical or k-mean because the two step clustering can work with _________ data and it is not bound to an arbitrary choice of the number of clusters.
A
  • cluster analysis
  • hierarchical
  • K-mean
  • Two-step
  • categorical