Cluster Analysis Flashcards
What is Cluster Analysis?
is a multivariate statistical technique that groups observations on the basis some of their features or variables they are described by
observations in a dataset can be divided into different groups
example: clustering by geographic proximity
or language
or Market Segmentation
What is the goal of Cluster Analysis?
To maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters
When is clustering most often used?
is often used as a preliminary step in all types of analysis
it is a useful technique for exploring and identifying patterns in the data
Data Scientists often turn to it when they have no idea where to start or what to expect
What is a key distinguishing trait of supervised leanering?
We are dealing with labeled data
What is the Euclidean distance?
What is a Centroid?
the mean position of a group of points
aka - center of mass
What does K in K-means clustering stand for?
The number of clusters
What is the proper way of selecting the number of clusters?
The elbow method
What is Clustering about?
- Minimizing the distance between points in a cluster
- Maximizing the distance between clusters
What does WCSS stand for?
Within-cluster sum of squares
if we minimize WCSS we have reached the perfect clustering solution
What are pros of K-Means Clustering?
- Simple to understand
- Fast to cluster
- Widely available
- Easy to implement
What are some cons of K-means Clustering?
- We need to pick K
- Sensitive to initialization
- Sensitive to outliers
- Produces spherical solutions
- Standardization
What are the 3 Types of Analysis?
- Exploratory
- Confirmatory
- Explanatory
What are characteristics of Exploratory Analysis?
- Getting acquainted with the data
- Search for patterns
- Plan - determining what methods may be useful to investigate further
ie. Data Visualization, Descriptive Stats ( pd.describe() ), Clustering
What are characteristics of Confirmatory and Explanatory Analysis?
- Explain a phenomenon
- Confirm a hypothesis
- Validate previous research
using hypothesis testing and regression analysis