Quiz 3 Flashcards

1
Q
  1. How does the A-priori algorithm work? In particular, how does A-priori generate candidate itemsets and prune non-frequent itemsets? You can choose to explain with an example or explain it with an algorithm.
A

With shelf management we look at:
-items purchased by customers

A-Priori Steps:
-Count the occurrences
-Record the item counts
-Record the frequent items
-Generate Pairs
-Candidate pairs + occurrences (C2)
-Prune non frequent pairs (L2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. How does the Park-Chen-Yu algorithm improve the A-priori algorithm? Explain your answer. [Hint: Explain how does they PCY discard non-frequent itemsets].
A

PCY uses hash tables in which can be more efficient. It counts pairs that hash to frequent buckets.

Pass 1:
-Counts items
-Creates hash
-hash pairs items into buckets and count

Pass 2:
-Converts count to bit-vector = frequent buckets
-Counts pairs that hash to frequent buckets

Pass 3:
-Generate triplets
-Prune non frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Consider a transactional database where milk and diapers are bought together 25 times. If milk alone (the only item in a basket/receipt) is bought at least 5 times, what is the least number of transactions in the database?
    A. 15
    B. 25
    C. 30
    D. 35
A

C. 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. What is the main goal of the market-based model for frequent itemset mining in supermarket shelf management?
    a. Maximize the number of items on the shelf
    b. Identify items that are bought together by sufficiently many customers
    c. Minimize the sales data collected with barcode scanners
    d. Reduce the variety of products in the supermarket
A

b. Identify items that are bought together by sufficiently many customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What is the key concept exploited by the Apriori algorithm to efficiently mine frequent itemsets?
    A. Association rule generation
    B. Monotonicity
    C. Support count pruning
    D. Non-binary data handling
A

C. Support count pruning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Which Data structure is used for storing item set counts in the PCY algorithm?
    A. Hash table
    B. Priority Queues
    C. Stack
    D. Double array (Matrix)
A

A. Hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What is the main memory requirement for the A-Priori algorithm?
    A. Proportional to the number of frequent items
    B. Proportional to the square of frequent items
    C. Proportional to the size of the dataset
    D. Proportional to the number of baskets
A

A. Proportional to the number of frequent items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. True | False The Park-Chen-Yu (PCY) algorithm can eliminate potential candidate pairs even if the candidate pair consists of two frequent items.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. True | False If item I does not appear in s baskets, then no pair including i can appear in s baskets.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. True | False A-priori algorithm is more efficient in mining frequent itemsets from large datasets.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. True | False An item can be part of a frequent itemset even if it is individually not a frequent item.
A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. True | False The PCY algorithm exploits idle memory on the first pass to reduce memory required in the second pass.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. True | False Pairs that hash to a bucket with total count less than support threshold can be eliminated as candidates in PCY algorithm.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly