Chapter 9: Unsupervised Machine Learning Flashcards
What is a popular algorithm for identifying clusters?
k-means
What is k-means?
K in k-means represents the number of clusters, or groupings.
Note that k-means works only for numerical data; that is, all attributes being considered have numerical values.
K-means is acceptable to use for master data?
incorrect, only transactional data
What can be used for clustering categorical data?
k-modes
What is the optimum number of clusters?
between 1 and N
What is cluster size?
the number of members within a cluster
What is cluster density?
more customers are in a cluster versus another cluster
What is cluster distance?
Indicates how dissimilar customers in one cluster are from customers in another cluster.
What is an association analysis or “affinity analysis?
A type of unsupervised descriptive data model, it is used to find the hidden connections between sets of items the frequently occur together.
“{PIZZA-BY-THE-SLICE} → {SOFT DRINK 20 OZ}” is an example of what?
association rules or associational analysis
Businesses can use these association rules to do what?
To promote and recommend items that often occur together.
What is an example of a {antecedent(s)} → {consequent(s)}?
{Knee Pads} → {Off Road Helmet}
{Deluxe Touring Bike Black, Elbow Pads} → {Off Road Helmet}
What is the support rule in association rules?
Support for a rule is the fraction or percentage of transactions that contain all of the items within the rule
Excessive numbers of rules represent an ______ to effective analysis
obstacle
Confidence is described as?
Confidence is the measure or probability of the consequent items in transactions that contain the antecedent items. It is a conditional probability
What is used to measure the absence of the antecedent?
a Lift
What is a Lift?
Lift is the measure of how accurately a rule depicts affinity or association compared to the random (coincidental) co-occurrence of the items.
Lift = _______________________?
Lift = confidence of a rule / support of the consequent
Lift values greater than _______ imply that the antecedent and consequent are associated (correlated) with each other..
1
Lift values less than 1 imply what?
The two items are negatively correlated.
What does a apriori algorithm do?
Makes the association analysis straightforward and trims out the infrequent rules by default.