Chapter 9: Unsupervised Machine Learning Flashcards
What is a popular algorithm for identifying clusters?
k-means
What is k-means?
K in k-means represents the number of clusters, or groupings.
Note that k-means works only for numerical data; that is, all attributes being considered have numerical values.
K-means is acceptable to use for master data?
incorrect, only transactional data
What can be used for clustering categorical data?
k-modes
What is the optimum number of clusters?
between 1 and N
What is cluster size?
the number of members within a cluster
What is cluster density?
more customers are in a cluster versus another cluster
What is cluster distance?
Indicates how dissimilar customers in one cluster are from customers in another cluster.
What is an association analysis or “affinity analysis?
A type of unsupervised descriptive data model, it is used to find the hidden connections between sets of items the frequently occur together.
“{PIZZA-BY-THE-SLICE} → {SOFT DRINK 20 OZ}” is an example of what?
association rules or associational analysis
Businesses can use these association rules to do what?
To promote and recommend items that often occur together.
What is an example of a {antecedent(s)} → {consequent(s)}?
{Knee Pads} → {Off Road Helmet}
{Deluxe Touring Bike Black, Elbow Pads} → {Off Road Helmet}
What is the support rule in association rules?
Support for a rule is the fraction or percentage of transactions that contain all of the items within the rule
Excessive numbers of rules represent an ______ to effective analysis
obstacle
Confidence is described as?
Confidence is the measure or probability of the consequent items in transactions that contain the antecedent items. It is a conditional probability