4. Alpha algorithm Flashcards
What are the requirements for the alpha algorithm?
To be able to construct a petri net the alpha algorithm requires:
- all possible directly follows relations to be present in the log (i.e. if b and c are both possible options after a then we need at least two traces with ab and ac for sufficiency)
- log generated by structured workflow net (no duplicate labels, no test loops i.e. place in both pre and postset of a transition, no tau transitions, no loops of length 2, cannot mix choice and parallelism??)
- log contains only complete cases and is noise free
Alpha algorithm guarantees rediscoverability (ability to recreate correct model from event data) for these nets
What is an event log for the purpose of process discovery?
A multi(set) of sequences of events
What should I not forget to do in the alpha algorithm?
Label the places!
Check the created net against the log (are these traces actually possible?)
What are the steps for the alpha algorithm?
- Get all events
- Find which elements are ever first (start activities)
- Find which elements are ever last (end activities)
- Go through all event logs to find sequences of letters (do they never happen next to each other, always in same order, or interchangeably)
- Find all maximal pairs (pairs (A, B) of sets of activities such that every element a∈A and every element b∈B are causally related (i.e., a →L b), all elements in A are independent (a1#La2), and all elements in B are independent (b1#Lb2).)
- Will have a place between events on LHS and RHS of pair
Why can you not have self-loops with alpha algorithm?
If you have bbbb in sequence leads to b||b- then b would not be independent of itself so we could not have it as A or B in a pair and it therefore would not be connected to any place
Why can you not have loops of 2 in the alpha algorithm?
e,f,e,f leads to e||f. f will not have a causal relation with anything except e so will then not be connected to any place
How to calculate fitness?
Fitness is fraction of the behaviour in the log that is explained by the model
fitness = |L∩M|/|L|
Fitness = 1, all traces explained by model, =0 no traces are explained
What is token based replay?
Fire transitions as they appear in the trace and count how many tokens are missing/ left behind
Count how many consumed (including final one), produced (including initial token), missing (we need to add into the net to enable a certain transition), and remaining
Make sure to multiply this by the frequency of the trace
Fitness = 1/2(1-missing/consumed) + 1/2(1-remaining/produced)
Does token based fitness = best model?
No, the flower model has perfect fitness
Is alpha algorithm practical?
No sensitive to noise, infrequent behaviour and cannot handle complex routing constructs
If b always follows a what does the alpha algorithm conclude?
We know then that a->b in that a is always directly followed by b in the log
The alpha algorithm then ASSUMES (not necessarily correct) that they therefore have a causal relation (same for independence etc. alpha algorithm makes assumptions)
Limitations of alpha algorithm?
- Need complete log of ordering relations
- Might create unnecessarily complex nets
- Cannot handle loops of length 1 or 2
- Struggles to find non-local dependencies (i.e. non-free choice process constructs)
What is alpha algorithm?
Process discovery algorithm to find a (sound?) workflow net