Data Analysis IV: Cluster Analysis (Week 11) Flashcards
What is the aim of cluster analysis?
To form clusters/segments
What is market segmentation?
Involves viewing a heterogeneous mkt as a number of smaller homogeneous mkts,
in response to differences b/w customers, and acting upon these diffs. b/w subgroups
What are the steps involved for segmentation?
- Determine segmentation BASIS
- Determine segmentation METHOD
- CREATE segments
- DESCRIBE segments
What are the steps involved for targeting?
- SELECT one or more segments
What are the steps involved for positioning?
- Develop STRATEGY & TACTICS for selected segments
What are the types of segmentation basis?
General (consumer-based) vs. product specific
Observable (objective) vs. unobservable (subjective)
What are general & observable variables to form a segmentation basis?
Cultural, geographic, demographic & socio-economic variables
What are product-specific & observable variables to form a segmentation basis?
User status, user frequency, store loyalty & patronage, situations
What are general & unobservable variables to form a segmentation basis?
Psychographics, values, personality & lifestyle
What are product-specific & unobservable variables to form a segmentation basis?
Psychographics, benefits, perceptions, elasticities, attributes, preferences, intentions
What are the criteria for effective segmentation?
Segments should be:
- Identifiable
- Substantial
- Accessible
- Stable
- Responsive
- Actionable
Why is a substantial segment necessary for effective segmentation?
Sizable segment: To maintain profitability
Very costly to create multiple mktg msgs for diff. small segments
Why is an accessible segment necessary for effective segmentation?
Able to reach individual segments separately (and target specific groups)
Why is a stable segment necessary for effective segmentation?
Stable over time, not switching from one to another
Why is a responsive segment necessary for effective segmentation?
Responsive - Diff. segments give diff. responses
Why is an actionable segment necessary for effective segmentation?
Able to distinguish between diff. segments, diff. platforms
How do we form clusters? (i.e. What are the segmentation methods?)
- A-priori
2. Post-hoc
What is the a-priori segmentation method?
Segments determined by researchers
What is the post-hoc segmentation method?
Based on analyses.
Hard clustering - e.g. cluster analysis
Soft clustering - e,g, latent class analysis
What is hard clustering?
Person CANNOT belong to >1 segment, can only belong to 1 cluster
What is soft clustering?
Ppl have a certain PROBABILITY to be part of segment
How do we want observations to be clustered?
We want observations WITHIN a cluster to be CLOSE TOGETHER
And CLUSTERS to be FAR from each other
Why is it impossible to try all options for clustering?
- Observations are closer together
- Have more than 2 variables
- Have many observations
- How many clusters?
What are the types of algorithms for cluster analysis?
- Hierarchical
2. Iterative (e.g. k-means)
What is the hierarchical algorithm for cluster analysis?
Start with: all subjects separately (agglomerating) OR all in one cluster (divisive)
Combine/separate until reaching the end
What is the iterative algorithm for cluster analysis?
Start with a solution
Move subjects b/w clusters until convergence
What are the types of hierarchical algorithms?
- Agglomerating (e.g. Nearest Neighbour, Ward’s)
2. Divisive
What is the agglomerating hierarchical algorithm?
Start: Each subject separately
Process: Join subjects/clusters together
End: All subjects in 1 cluster
What is the divisive hierarchical algorithm?
Start: All subjects in 1 cluster
Process: Split clusters
End: Each subject separately
What are the steps involved in agglomerating hierarchical algorithm?
E.g. 15 observations
Start: 15 clusters
- Calculate distances b/w points
- Combine points w/ smallest distance (homogeneous within clusters, heterogeneous b/w clusters)
- Calculate distance b/w 14 clusters and 1 cluster
- Form next cluster w/ smallest distance
What are the methods to calculate the distance b/w a subject and a cluster, or between 2 clusters?
- Nearest Neighbour (Single Linkage)
- Centroid Method
- Furthest Neighbour (Complete Linkage)
- Ward’s Method
What are the steps involved for Nearest Neighbour (Single Linkage)?
Choose subject for which the distance to NEAREST subject is SHORTEST
What are the advantages of using Nearest Neighbour (Single Linkage) method?
Tendency to create chain-like clusters
Suitable for detecting outliers
What are the steps involved for the Centroid Method?
- Choose subject for which the distance to the MEAN of the cluster is SHORTEST
What is the advantage of using the Centroid Method?
Little influence of outliers
What are the steps involved for Furthest Neighbour (Complete Linkage) method?
Choose subject for which the distance to MOST FAR AWAY subject in cluster is SHORTEST
What is the disadvantage of using the Furthest Neighbour (Complete Linkage) method?
Very sensitive to outliers
What are the steps involved for Ward’s Method?
Choose subject that MINIMISES WITHIN-CLUSTER VARIANCE
What is the advantage of using Ward’s Method?
Creates cluster of SIMILAR SIZE that are relatively COMPACT
How do we determine whether there is an improvement if we move a subject to the other cluster?
Need to tell them:
- No. of cluster
- Exact composition - who’s in what cluster
What is the 3-step approach that combines hierarchical and iterative algorithms?
- Nearest Neighbour
- To detect OUTLIERS - Ward’s Method
- To decide on NO. OF CLUSTERS and obtain initial solution for Step 3 - K-Means
- To obtain FINAL cluster solution
What is the output obtained from the Nearest Neighbour method? How do we identify outliers?
Output: DENDROGRAM
Identify outliers: Dendrogram: - Indicates agglomeration order. Last subjects to be added may be outliers
Agglomeration schedule
How do we decide on the number of clusters when executing Ward’s Method?
- Manageable no. of clusters?
- Size of clusters?
- Interpretation of clusters
- Large “horiz. jump” in dendrogram
- Large jump in coefficients (agglomeration schedule)
What is the output obtained from Ward’s Method?
Cluster membership in data columns - Frequency tables for e.g. 3- & 4-cluster solution
Cluster means
What is the output obtained from K-Means Clustering?
Final cluster membership in data columns
Freq. table of final cluster size
What are the 2 operational issues for cluster analysis?
- Distance measure
2. Standardisation
What is the operational issue regarding distance measure?
- For continuous variables: Euclidean distance
- For binary: E.g. Simple matching coefficient
What is the operational issue regarding standardisation?
- Per variable: Use if variables are measured on diff. scales
- Per subject: Use if subjects have very diff. means
What are the considerations to decide which clusters to target?
- Fit with cluster positioning?
- Cluster size?
- Cluster profitability
What are the considerations for positioning?
Diff strategy for diff. target groups?
- One vs. multiple brand(s)
- Diff. sales pitch