More Advanced Methods; Cluster Analysis Flashcards
Cluster Analysis is an __________ data analysis tool for organizing observed data into meaningful clusters, based on combinations of variables.
exploratory
When to look at grouping (cluster) patterns?
- A PT practitioner would like to group patients according to their attributes in order to better treat them with personalized care plan.
- A PT practitioner would like to classify patients based on their individual health records in order to develop specific management strategies that are appropriate to the patients.
What are the 2 types of clusters?
- ) Hierarchical Clustering
2. ) Non-hierarchical Clustering
Hierarchical Clustering:
- A set of nested clusters organized using a hierarchical _____.
- The hierarchical methods produce a set of nested clusters in which each pair of individuals or clusters is progressively nested in a larger cluster until only ____ cluster remains, or all individuals in one group are partitioned step by step.
- tree
- one
Non-hierarchical Clustering:
- A group of individuals into non-overlapping subsets (clusters) such that each object is in exactly __ cluster.
- The non-hierarchical methods divide a dataset of n individuals into m clusters.
- ________ clustering is the most commonly used non-hierarchical technique.
- one
- K-mean
What are the 3 types of clustering techniques?
- Hierarchical Clustering
- K-Mean Clustering
- Two-Step Clustering
__________ clustering is a clustering algorithm where
the clustering is mapped into a hierarchy basing its grouping on the inter-cluster similarities or dissimilarities.
Hierarchical
What are the two “types” of Hierarchical Clustering?
- Bottom-Up (agglomerative)
- Top-Down (divisive)
What is bottom-up clustering?
Starts with 1 single piece of datum and then merge it with others to form larger groups.
What is top-down clustering?
Starts with all in one group and then partition the data step by step using a flat clustering algorithm.
What are the steps for bottom-up (agglomerative) clustering?
- ) Assign each item to a cluster.
- ) Find the closest (most similar) pair of clusters and merge them into a single cluster, so there is now one cluster less.
- ) Compute distances (similarities) between the new cluster and each of the old clusters.
- ) Repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size.
What are the limitations of Hierarchical Clustering?
- Arbitrary decisions- necessary to specify both the distance metric and the linkage criteria without any strong theoretical basis for such decisions.
- Data types- works well with continuous data.
- Misinterpretation of dendrogram- selecting the number of clusters using dendrogram may mislead.
_______ clustering is a clustering algorithm where data is classified into K number of clusters. This is the most widely used clustering method. Each individual data is mapped into the cluster with its nearest mean.
K-mean
What are the steps for K-Mean Clustering?
- ) Select K points as the initial centroids.
- ) Assign points to different centroids based on proximity.
- ) Re-evaluate the centroid of each group.
- ) Repeat steps 2 and 3 until the best solutions emerges (the centers are stable).
What are the limitations of K-Mean Clustering?
- The researcher chooses the number of clusters.
- More Ks (number of clusters), shorter distance from the centroid.
- As an extreme scenario: When every data point is a centroid, the distance is zero.
- What is the optimal K?