Data Mining Flashcards

Question

Give an example of where Association is used

Answer 1

Supermarket shelf management Goal: Identify items that are bought together by sufficiently many customers. Method: process all the transaction data collected with barcode scanners to find dependencies among items.

Answer 2

minimum number of instances

Answer 3

find one attribute to use that makes fewest prediction errors generates rules that only include one attribute plus the class

Answer 4

For each attribute A: For each value V of that attribute, create a rule: 1. count how often each class appears 2. find the most frequent class, c 3. make a rule "if A=V then C=c" Calculate the error rate of this rule. Pick the attribute whose rules produce the lowest error rate

Answer 5

finds frequent itemsets using candidate generation. Finds associations between data items generates association rules that involve several attributes and does not focus on any particular attribute

Answer 6

1. Set a minimum coverage 2. Find all one-attribute associations, which satisfy the minimum coverage; 3. Find all two-attribute associations, which satisfy the minimum coverage; 4. Until either reach a specified maximum number of attributes, Or can no longer generate associations that have the set minimum coverage 5. Set a minimum accuracy (confidence) 6. Generate rules from each association, which satisfy the minimum accuracy.

Answer 7

used to generate a decision tree from a dataset. Several rules can be generated and it only accepts nominal values

Answer 8

Classify examples by sorting them down the tree from the root node to some leaf notes Learned function represented by tree Each node in tree is tested on some attribute of an instance Branches represent values of attributes Follow tree from root to leaves for output value.

Answer 9

Entropy is a measure of 'degree of doubt' The higher it is, the more doubt there is about the possible conclusions. The attribute which has the lowest entropy is the most useful determiner.

Answer 10

It generates a detailed decision tree. With training data provided, it is always able to generate a tree. it is easily implemented The output is easily to be understood and interpreted The process is simple process Its running time increases only linearly with the complexity of the problem

Answer 11

Wholly spurious correlations are possible, since the algorithm takes no account of any meaning that the data it works on may have. The algorithm considers just one attribute at a time. When inducing rules from large sets of examples in which there are a large number of possible outcomes, then the algorithm can be very sensitive to apparently trivial changes in the set of examples. The algorithm cannot generate uncertain rules or handle uncertain data

Answer 12

Transactions item set: {nappies, beer} | IF bought_nappies THEN bought_beer_likely

Answer 13

``` IF buy_time in December and cost > 500 and type_of_item = electronics and location = overseas and ...etc... THEN possibly_fraudulent = yes ```

Answer 14

Retail/Marketing ID buying patterns associations among customer demographics Banking patterns of fraudulent use id loyal customers spending of customer groups Insurance Claims analysis Medicine successful therapies

Data Mining Flashcards

(39 cards)