C4 Flashcards
pattern
describes local structure in the data
pattern mining
given a database D, a pattern language and a set of constraints C, find the set of patterns S in the pattern language such that each pattern satisfies each constraint on D and S is maximal
=> find all patterns satisfying the constraints
problems with pattern mining
- high thresholds: few, already well-known patterns
- low thresholds: pattern explosion
- many similar patterns
pattern set mining
find the set of patterns S in the pattern language such that each pattern in S satisfies each constraint on D and S is optimal with respect to optimisation criterion G
pattern set mining
find the set of patterns S in the pattern language such that each pattern in S satisfies each constraint on D and S is optimal with respect to optimisation criterion G
find a global model of local patterns
MDL-based pattern set mining
find the set of patterns S in the pattern language such that each pattern in S satisfies each constraint on D and S is MDL optimal
find a model that compresses the data
compressing with code tables
- lossless encoding/decoding
- code table consisting of itemsets and their codes
cover algorithm: encodes a single transaction and replaces each occurrence of itemset with its code
optimal code length when compressing with code tables
for x in code table
usage(x) = |{x in Cover(CT, t)}|
P(x) = usage(x) / sum_{y in CT} usage(y)
according to Shannon, the optimal code for distribution P has length L(code_CT(x)) = -log(P(x))
total encoded size of data set D with code table CT
L(D, CT) = L(D|CT) + L(CT)
where L(D|CT) is the size of D encoded with CT and L(CT) the size of the code table
KRIMP
iteratively consider candidate patterns, add a pattern if it results in a total encoding with a smaller compressed size, otherwise discard it
SLIM
KRIMP is computationally (many candidates and database scans to construct a cover)
SLIM combines pattern pairs in code table and estimate improvement in compression
what does it mean if a pattern-based model is characteristic?
different data distributions get different models and different models imply different data distributions
optimal MDL results are characteristic
compression based classification
Split a dataset into classes, apply KRIMP and make a code table per class. Then encode an unseen transaction with both code tables and assign it to the class with the shortest code length
data-driven clustering algorithm
inspired by k means
1. start with k equal-sized random clusters D_i
2. construct code table CT_i for each D_i
3. use Bayes optimal choice to re-assign transactions
4. repeat 2 and 3 until clusters are stable
dissimilarity measure
MDL says: the optimal compressor induced from a database x
compresses x better than a compressor induced from a
database y
C_x(y) is the size of database y as compressed by the compressor induced from database x
compressor dissimilarity DS(x,y) = max{C_y(x) - C_x(x)/C_x(x), vice versa}
the higher the dissimilarity, the higher the classification accuracy tends to be