B04 Association Rules Flashcards

Exam Prep

1
Q

__________ are involved with summarizing or grouping data in new and interesting ways. In these types of models, no single feature is more important than the other. The process of training a __________ is known as ___________

A

Descriptive Models

Descriptive Model

Unsupervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The application of machine learning to customer retail transactional data in order to detect patterns in purchasing behavior.

A

Market Basket Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The result of a market basket analysis is a collection of __________ that specify patterns found in the relationship among items or itemsets.”

A

Association Rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

{beer, milk}→{diaper} LHS -> RHS Define the LHS and RHS

A

LHS condition that needs to be met to trigger the rule RHS the expected result of meeting the condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Association Rules imply causality between items True False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Kinds of Association Rules 1.______ 2.______ 3.______

A

Actionable Trivial Inexplicable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Itemset

A

A collection of one or more items e.g. {beer, bread, diaper, milk, eggs}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Transaction

A

The itemset for an observation. e.g. T(3) = {milk, diaper, beer, coke}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Support Count (σ)

A

Frequency of an itemset. e.g. σ({beer, milk, diaper}) = 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Support (s)

A

How frequently a rule occurs in the dataset.

Support is the fraction of transactions containing an itemset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Confidence

A

The predictive power or accuracy of a rule.

Confidence is the support of the itemset containing both X and Y divided by the support of the itemset containing only X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Lift

A

The increased likelihood that a rule occurs in a dataset relative to its typical rate of occurrence.

Lift is the confidence of the itemset containing both X and Y divided by the support of the itemset containing only Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Rules with ___ support may occur simply by chance and are typically not actionable

A

Low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The Apriori Principle is based on the __________ property of support.

A

anti-monotone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The support of an itemset _____ (always, sometimes, never) exceeds that of its subsets.

A

Never

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you reduce the computational complexity of Association rules?

A
  1. Apriori Algorithm - pruning the itemset lattice
  2. FP-growth - Reduce number of comparisons by using advanced data structures to store the candidate itemsets or to compress the dataset
17
Q

Approaches to Association Rules include:

A
  1. Brute Force Approach - generate every itemset (computationally expensive)
  2. Frequent Itemset Generation - generate every itemset whose support is greater than an established minimum (computationally expensive)
  3. Apriori approach
  4. FP Growth
18
Q

How does the Apriori approach work?

A

Pruning - If we identify an itemset as being infrequent, then its supersets should not be generated/tested

19
Q

Advantages of Association Rules?

A
  1. Capable of working with large amounts of transactional data. 2.Rules are easy to understand.
  2. Useful for “data mining” and discovering unexpected patterns in data.
20
Q

Weaknesses of Association Rules?

A
  • Not very useful for small data sets.
  • Separating true insight from common sense requires some effort.
  • Easy to draw misleading conclusions from random patterns.