Pattern Matching Flashcards
how does boyer-moore work
- idea: makes use of characters that are not in the pattern (bad characters) by skipping them completely
- implementation
1. build last occurence table that records the index of the last occurrence of letters in the pattern; bad characters have index of -1
2. loop from 0 to n - m
2.1 for every iteration start an inner loop that begins with pattern’s end to its start
2.22 when a mismatch is found, look up the table and deterimine how many indices need to be shifted - time comlexity
- single occurrence: O(m) in best case, O(mn) in worst case
– all occurrences: O(m + n/m) in best case, O(mn) in worst case
- single occurrence: O(m) in best case, O(mn) in worst case
how does KMP work
- idea: make use of prefix and suffix that pattern and text string both share to efficiently skip over
- implementation
1. build up the failure table that tells you the length of the longest suffix of the substring the ends right before the current index that is also the substring’s prefix
2. loop through the text string and every character in it will be looked up only once
2.1 for every iteration loop the pattern
2.2 if the inner loop terminates then a match is found
2.3 otherwise there’s a mismatch where we look up the failure table to decide where to begin looking in the pattern for the next outer loop by looking at f[j - 1]
time complexity:
- single occurrence: O(m) for best case and O(m + n) for worst case
- all occurrences: O(m + n) for all cases
how does Rabin-Karp work
idea: convert strings to integers to compare
implementation:
1. calculate the initial hashes for the pattern and first length-m substring of the text
2. compare text hash with pattern hash
2.1 if they are equal, compare character by character
2.2 if not don’t compare
3. update text hash to the next substring by rolling hash and repeat step 2
3.1 newHash = (oldHash - h(oldChar) * base^(m-1)) * base + h(newChar)
time complexity:
- single occurrence: O(m) in best case, and O(mn) in worst case
- all occurrences: O(m + n) in best case and O(mn) in worst case