Chapter 6 test deck Flashcards
frquent pattern
A set of items subsequences substructures that occure frequently in a data set
an intrinsic and important p[roprty if dataset
frequent itemset
a set of items that appear frequently together in a transaction data set, e.g. milk and bread
grquent sequential pattern
buying your first pc, then a digital camera and then a memory car
substructures
refer to different strucural forms such as subgraphs, subtrees, or sublattice, which may be combined with itemsets or subsequences. If a substructure occurs frequently it is called a frquent structured pattern
fequent pattern mining
searches for recurring relationships in a given data setand is the foundation fro many essential data mining tasks
market basket analysis
the earliest form of frequent pattern mining for association rule
association rule
each item has a boolean variable representing the presence or absence of that item
each baskit can then be represented by a boolean vector of values
support
usefulness
confidence
certainty of rules
the teliability of the inference made by a rule
the higher the confidence the more likely it is for b to be present in transactions that contain A
itemset
a set of one or more items
occurenace frequency
frquency of an itemset x, supportm suoport count, or count of the itemset
relative support
the fraction of transaction that contain x
Association rule does not necessarily imply causality
mining association rules
- find all frequent itemsets: each of these item sets will occur at least as frequently as a predetermined min support count
- generate strong association rules from the frequent itemsets: these rules must satisfy minimum support and minimum confidence
apriori property
all nonempty subsets of a frequent itemset must also be frequent
antimonotonicity
if a set cannot pass a test, all of its supersets will fail the same test as well. The property is monotonic in the context of failing a test
apriori
acandidate generation-and-test approch
uses horizontal data format
TID itemset format
eclat
frquent pattern mining with vertical data format
equivalence class transformation algorithm
item TID _set format
the support count of an itemset is simply the length of the TID_set of the itemset
Downward closue property of requent patterns
any subset of a frequent item set must be frequent
if beer, diaper, and nuts are frequent, beer and diapers must be too
every transaction having all three also contains just two
scalable mining methods
- apriori
- frequent pattern growth not covered in class
- vertical data format approach
apriori pruning priciple
if ther is any itemset which is infrequent, its superset should not be generated or tested
apriori method
initially scan DB once to get requent utemset
generate length cadidate itemsets form length k frequent itemsets
test the candidtis against DB
terminate when no frequent or candidate set can be generated
Association rule generation
- for each frequent item set, generate all non empty subsets of it
- for every nonempty subsetS of I output rule s -> I-S if couport I / support S >= min
patern evaluation method
strong association rules can be uniteresting and misleading
correlation mesasure lift
the ocurrance of itemseta is independent of the occuence of item set b if p(aub) = p(a)P(b) otherwise items set a nad b are dependent and correlated as events
negativly correlated
lift(a,b) is < 1 meaning the occuence of one likey lead to the absence of the other
positivly correlated
lift a,b > 1 meaning that the occurance of one implies the occurrence of the other
independent
lift a,b =1 there is no correlation between them