Unit 3 - Sequence Pattern Mining Algorithm Flashcards
What is sequence pattern mining?
Search for association Pattern Mining that occurs over time
eg.
1. 01 Jan 2010: Wheat biscuits
2. 03 Jan 2010: Soy milk
3. 05 Jan 2010: Honey cereal
What is an itemset in sequence pattern mining?
the set of items bought by a customer that are grouped by timestamp (e.g. all the products bought by a customer on 31 May 08 in a supermarket).
What is sequence pattern mining useful for?
Useful when data to be mined have some sequential nature
The goal of sequence pattern mining?
To discover frequent sequences of itemsets in a dataset
What’s a sequence?
A list of items that tend to occur in a predictable order
Sequence pattern mining good in which sort of business?
The retail industry is the most common area in which time-ordered transactions are generated. Insurance claims, bank transactions and medical procedures are other potential applications. Sequence pattern mining can be used to improve decision-making in a wide variety of applications such as market basket analysis, churn analysis and prevention, fraud detection, and security analysis.
What are the types of decision-making can be supported by sequence pattern mining?
> Knowing what customers will buy next
- help to target appropriate customers
- identify cross-selling opportunities
Based on purchase patterns & attributes
- Segment customers to better understand profile
- determine response propensities (to product offers) of each segment
- anticipate & prevent customer churn
- Detect patterns of potentially harmful behaviour (fraud detection)
identify sequential symptoms to aid effective medical diagnosis
What is phase 1 of sequence pattern mining?
Continuously constructs a lattice of all large itemsets. The algorithm removes itemsets from the lattice when their maximum support level falls below defined threshold support. Continues until end of the transaction
What is phase 2 of sequence pattern mining?
rescans the transactions and continues to remove itemsets that have maxSupport below the defined threshold support. Phase II can now accurately compute the number of occurrences of each remaining itemset in the lattice.