Chapter 3 Flashcards

Question

What is the occurrence frequency or support count of an items?

Answer 1

The number of transactions that contain the itemset

Answer 2

A set of items that appears in at least fraction s of the baskets, where s, the support, is some pre-defied chose constant.

Answer 3

The set of frequent k-itemsets

Answer 4

That of mining frequent itemsets.

Answer 5

- Find all frequent itemsets Each of these itemsets will occur at least as frequently as a predetermined support count, min_sup - Generate strong association rules from the frequent itemsets If all association rules were generated from frequent itemsets, only those above a minimum level of confidence are retained as strong association rules.

Answer 6

One that satisfies minimum support and minimum confidence.

Answer 7

Generating strong association rules is much less costly than finding all frequent itemsets. Hence, the overall performance of mining association rules is determined by the first step.

Answer 8

There are three criteria - Based on levels of abstraction - Based on the number of data dimensions involved in the rule - Based on the type of values handled in the rule

Answer 9

The items bought are referenced at different levels of abstraction (eg computer is a higher-level abstraction than laptop). We refer to the rule set mined as consisting of multi-level association rules.

Answer 10

If the items or variables in an association rule reference only one dimension, then it is a single dimensional association rule. A rule with dimensions such as age and income, is a multi-dimensional association rule.

Answer 11

If a rule involves association between presence or absence items, it is a Boolean association rule. If the rule involves associations between quantitative items of variables, these are discretised and the rule is referred to as a quantitative association rule.

Answer 12

Finding all frequent itemsets

Answer 13

In this algorithm, one considers all possible subsets of I (items in the shop) and in each case, the support is calculated. Only subsets above the minimum support threshold are considered to be frequent itemsets.

Answer 14

This algorithm requires the evaluation of all subsets of the items I - a large number. There are 2^m subsets of I and to calculate the support of each subset, there are (2^m * n) order of operations - the computational effort grows exponentially with the number of items m.

Answer 15

The Apriori algorithm which utilises the Priori property.

Answer 16

That if a (k-1) items A is not a frequent items, then any superset of A, B (ie A is a subset of B), will also not be a frequent itemset. It also indicates that all non-empty subsets of a frequent items must also be frequent.

Answer 17

An iterative approach, known as level-wise search, where frequent (k-1)-itemsets are used to obtain potential (or candidate) frequent (k)-itemsets.

Answer 18

- Given a specified support threshold s, the first pass finds the items that appear in at least fraction s of the baskets. This is all frequent 1-itemsets called L1 - Pairs of items in L1 become the candidate pairs C2 for the second pass. The pairs in C2 whose count reaches s are the frequent pairs, L2. This is all frequent 2-itemsets called L2. Candidate pairs who do not meet the minimum support requirement are pruned. - The items in L2 are then used to create C3, which then form L3 etc. - This iterative process continues until no more frequent itemsets can be found

Answer 19

L(k-1) is used to define Lk for k >= 2

Answer 20

Join and Prune action

Answer 21

- To find Lk of a set of candidate k-itemsets is generated by joining Lk-1 with itself. - This set of candidates is denoted Ck - Members of Lk-1 are joinable if their first (k-2) items are in common

Answer 22

- Ck is a superset of Lk - its members may or may not be frequent, but all of the frequent k-itemsets are included in Ck - A scan of the database to determine the count of each candidate in Ck results in the determination of Lk - We can compare members of Ck to those in Ck-1 to determine Lk by removing members straight away - if all items within an infrequent member Ck-1 are also within a member of Ck, then that member of Ck can be eliminated straight away from Lk

Answer 23

When Lk = Null

Answer 24

Generate strong association rules from them Strong association rules satisfy minimum support and minimum confidence.

Answer 25

[See flashcard]

Answer 26

The rules are generated from frequent itesmets

Answer 27

- Algorithm is very scalable ie capable of working with large amounts of transactional data - Result sin rules are very easy to understand - Useful for data mining and discovering unexpected knowledge in databases

Answer 28

- Not very helpful for small datasets - Requires effort to separate the true insight from common sense - Easy to draw spurious conclusions from random partners

Chapter 3 Flashcards

(53 cards)