Lecture 5 - KMP Flashcards
What does KMP stand for?
Knuth Morris Pratt (algorithm)
What kind of an algorithm is KMP? hint : on
An online algorithm, this means it has no need to back up in the text. It instead involves preprocessing of the border table.
What is a border table?
an array b with entry b[j] for each position j of the string
What happens if we get a mismatch?
we remain on the current text character and compare it to a character which is determined by the border table
What is a suffix?
substring that ends at n-1 of string
What is a prefix?
substring that begins at position 0
What is the border of the table?
A border of the string s is a substring that is both a prefix and a suffix, but it cannot be the string itself
What are the borders of acac gat acac?
ac and acac are both borders. acac being the longest one
What is the border of an empty string?
an empty string itself is the border (length 0)
What is the size of the border table
Same size as the string we are looking for.
What is b[j] of the border table?
It holds the longest border of substring 0 to j-1
What is b[0] always set to ?
0, as this refers to a substring of size 0 i.e. sub[0..0]
fill in the border table of ababaca
0 0 0 1 2 3 0
After mismatch what happens in the KMP algorithm visually?
the string s gets shifted left until the characters to the left of i (current position in t) match the characters of s to the left of i. This essentially determines what j becomes (j is the index of the string )
Is it often that border tables of strings can be created?
NO
What happens if we cannot get a border match after a mismatch?
We need to reset j to 0 (start checking from the beginning of the string), while i will remain unchanged.
EDGE CASE if j is already 0 , i.e. first char of string, then instead increment i by 1.
What is the complexity of the KMP algorithm?
O(n) in the worst case, as the number of the interations of the loop is at most 2n, this is because i and k always increase.
-> refer to slides
-> this is O(m+n) when combined with algorithm to create border table (the efficient one)
Why is KMP better than the brute force approach?
As we avoid matching characters we know will match.
What is the complexity of creating the border table?
O(j^2) to evaluate each length, then O(m) to evaluate all chars
hence O(m^3) overall
-> there are better approaches, refer to live lecture from week 5
LOOK OVER THE COMPLEXITY FOR BORDER TABLE CREATION