Chapter 9: Unsupervised Machine Learning Flashcards

Question 1

Q

What is a popular algorithm for identifying clusters?

Question 2

Q

What is k-means?

Answer

A

K in k-means represents the number of clusters, or groupings.
Note that k-means works only for numerical data; that is, all attributes being considered have numerical values.

Question 3

Q

K-means is acceptable to use for master data?

Answer

A

incorrect, only transactional data

Question 4

Q

What can be used for clustering categorical data?

Question 5

Q

What is the optimum number of clusters?

Answer

A

between 1 and N

Question 6

Q

What is cluster size?

Answer

A

the number of members within a cluster

Question 7

Q

What is cluster density?

Answer

A

more customers are in a cluster versus another cluster

Question 8

Q

What is cluster distance?

Answer

A

Indicates how dissimilar customers in one cluster are from customers in another cluster.

Question 9

Q

What is an association analysis or “affinity analysis?

Answer

A

A type of unsupervised descriptive data model, it is used to find the hidden connections between sets of items the frequently occur together.

Question 10

Q

“{PIZZA-BY-THE-SLICE} → {SOFT DRINK 20 OZ}” is an example of what?

Answer

A

association rules or associational analysis

Question 11

Q

Businesses can use these association rules to do what?

Answer

A

To promote and recommend items that often occur together.

Question 12

Q

What is an example of a {antecedent(s)} → {consequent(s)}?

Answer

A

{Knee Pads} → {Off Road Helmet}

{Deluxe Touring Bike Black, Elbow Pads} → {Off Road Helmet}

Question 13

Q

What is the support rule in association rules?

Answer

A

Support for a rule is the fraction or percentage of transactions that contain all of the items within the rule

Question 14

Q

Excessive numbers of rules represent an ______ to effective analysis

Question 15

Q

Confidence is described as?

Answer

A

Confidence is the measure or probability of the consequent items in transactions that contain the antecedent items. It is a conditional probability

Question 16

Q

What is used to measure the absence of the antecedent?

Answer

Study These Flashcards

A

a Lift

Question 17

Q

What is a Lift?

Answer

Study These Flashcards

A

Lift is the measure of how accurately a rule depicts affinity or association compared to the random (coincidental) co-occurrence of the items.

Question 18

Q

Lift = _______________________?

Answer

Study These Flashcards

A

Lift = confidence of a rule / support of the consequent

Question 19

Q

Lift values greater than _______ imply that the antecedent and consequent are associated (correlated) with each other..

Answer

Study These Flashcards

A

1

Question 20

Q

Lift values less than 1 imply what?

Answer

Study These Flashcards

A

The two items are negatively correlated.

Question 21

Q

What does a apriori algorithm do?

Answer

Study These Flashcards

A

Makes the association analysis straightforward and trims out the infrequent rules by default.

Chapter 9: Unsupervised Machine Learning Flashcards

(21 cards)