Quiz 3 Flashcards

Question 1

Q

How does the A-priori algorithm work? In particular, how does A-priori generate candidate itemsets and prune non-frequent itemsets? You can choose to explain with an example or explain it with an algorithm.

Answer

A

With shelf management we look at:
-items purchased by customers

A-Priori Steps:
-Count the occurrences
-Record the item counts
-Record the frequent items
-Generate Pairs
-Candidate pairs + occurrences (C2)
-Prune non frequent pairs (L2)

Question 2

Q

How does the Park-Chen-Yu algorithm improve the A-priori algorithm? Explain your answer. [Hint: Explain how does they PCY discard non-frequent itemsets].

Answer

A

PCY uses hash tables in which can be more efficient. It counts pairs that hash to frequent buckets.

Pass 1:
-Counts items
-Creates hash
-hash pairs items into buckets and count

Pass 2:
-Converts count to bit-vector = frequent buckets
-Counts pairs that hash to frequent buckets

Pass 3:
-Generate triplets
-Prune non frequent

Question 3

Q

Consider a transactional database where milk and diapers are bought together 25 times. If milk alone (the only item in a basket/receipt) is bought at least 5 times, what is the least number of transactions in the database?
A. 15
B. 25
C. 30
D. 35

Question 4

Q

What is the main goal of the market-based model for frequent itemset mining in supermarket shelf management?
a. Maximize the number of items on the shelf
b. Identify items that are bought together by sufficiently many customers
c. Minimize the sales data collected with barcode scanners
d. Reduce the variety of products in the supermarket

Answer

A

b. Identify items that are bought together by sufficiently many customers

Question 5

Q

What is the key concept exploited by the Apriori algorithm to efficiently mine frequent itemsets?
A. Association rule generation
B. Monotonicity
C. Support count pruning
D. Non-binary data handling

Answer

A

C. Support count pruning

Question 6

Q

Which Data structure is used for storing item set counts in the PCY algorithm?
A. Hash table
B. Priority Queues
C. Stack
D. Double array (Matrix)

Answer

A

A. Hash table

Question 7

Q

What is the main memory requirement for the A-Priori algorithm?
A. Proportional to the number of frequent items
B. Proportional to the square of frequent items
C. Proportional to the size of the dataset
D. Proportional to the number of baskets

Answer

A

A. Proportional to the number of frequent items

Question 8

Q

True | False The Park-Chen-Yu (PCY) algorithm can eliminate potential candidate pairs even if the candidate pair consists of two frequent items.

Question 9

Q

True | False If item I does not appear in s baskets, then no pair including i can appear in s baskets.

Question 10

Q

True | False A-priori algorithm is more efficient in mining frequent itemsets from large datasets.

Question 11

Q

True | False An item can be part of a frequent itemset even if it is individually not a frequent item.

Question 12

Q

True | False The PCY algorithm exploits idle memory on the first pass to reduce memory required in the second pass.

Question 13

Q

True | False Pairs that hash to a bucket with total count less than support threshold can be eliminated as candidates in PCY algorithm.

Quiz 3 Flashcards

(13 cards)