Module 2B Notes Flashcards
What is the goal of clustering?
To segment observations into similar groups based on observed variables.
What type of machine learning is clustering classified as?
Unsupervised machine learning.
What is a common application of cluster analysis in marketing?
Market segmentation.
What is a challenge when clustering observations based on continuous variables?
Determining how many clusters and where the boundaries are.
What is bottom-up hierarchical clustering?
A method that starts with each observation in its own cluster and merges similar clusters.
What is K-means clustering?
A method that assigns observations to one of k clusters based on similarity.
What is the most common method to measure dissimilarity between observations with continuous numeric variables?
Euclidean Distance.
What does Euclidean Distance measure?
The variance between observations, becoming smaller as observations become more similar.
What is the effect of scale on Euclidean Distance?
It is highly influenced by the scale on which variables are measured.
What is a Z-score?
A measure of how many standard deviations away from the mean a raw score is in a standard normal distribution.
What is Manhattan distance?
A dissimilarity measure that is more robust to outliers than Euclidean distance.
What is the matching coefficient?
A measure for categorical variables where two observations both having a 0 entry indicates similarity.
What limitation does the matching coefficient have?
‘0’ values can have many meanings and do not necessarily indicate similarity.
What is Jaccard’s coefficient?
A measure that does not count matching zero entries in categorical variables.
What does hierarchical clustering determine?
The similarity of two clusters by considering the similarity between the observations composing each cluster.
What are some methods for comparing observations in hierarchical clustering?
- Single linkage
- Complete linkage
- Group average linkage
- Median linkage
- Centroid linkage
What are the steps in K-means clustering?
- Initialization
- Update Step
- Assignment Step
What is an association rule?
An if-then statement that conveys the likelihood of certain items being purchased together.
What is the antecedent in an association rule?
The collection of items corresponding to the if portion of the rule.
What is the consequent in an association rule?
The item set corresponding to the then portion of the rule.
What is a key factor in evaluating an association rule?
How actionable it is and how well it explains the relationship between item sets.
What is an example of an association rule from Walmart’s data mining?
‘If a customer purchases a Barbie doll, then a customer also purchases a candy bar.’
What are some algorithms used to generate association rules?
- FP-growth algorithm
- Apriori algorithm
- Eclat algorithm
What do the algorithms for generating association rules differ in?
Their efficiency in large data sets.
Does KNIME disclose the algorithm used by the program?
No, KNIME does not disclose the algorithm.