Matching Algorithms Flashcards
What is the key idea behind the Boyer-Moore Algorithm?
The goal is to skip scanning all the characters. The idea is to start matching the characters from right of the pattern to left of the pattern. You encounter a mismatch between text and pattern at a searchIndex. You go backwards and check where you could have a possible match in the pattern. If there is none then you skip all the characters and shift the searchIndex by the full length of the pattern = m. You encounter a match you shift forward by the length of the pattern and shift backward by the position of the matched character in order to align that character with the text.
Boyer-Moore functional code
**public** **static** bool BoyerMooreMatch(string text, string pattern) **{****if**(pattern.Length \> text.Length)**{****return** **false**; **}** var j = pattern.Length - 1; var i = j; var n = pattern.Length; **while** (i \< text.Length && n \> 0) **{****if**(pattern[j] == text[i])**{**j--; i--; n--;**}****else****{**// we want to align the pattern to the text // after failing to match the character we search for the occurence of the // last character in the pattern matching the text**int**k = SearchCharBackwards(pattern, j, text[i]); // we move forward the whole pattern length and move backward by the position of the found character i += pattern.Length - k - 1; j = pattern.Length - 1; n = pattern.Length;**}****}****return**n == 0;**}**
Was ist die Idee hinter Knuth-Morris-Prat Algorithm?
Bei einer naiven Suche muss der Text an jeder neu gescannt werden, sobald ein Mismatch zwischen Muster und Text festgestellt wird.
Bei KMP Algorithmus möchten wir verhindern, dass wir jedes Mal den gesamten Text absuchen müssen. Dafür suchen wir nach Wiederholungen in dem Muster. An jeder Stelle im Muster möchten wir wissen, was die Länge des längsten Präfixes ist, was als Suffix vorhanden ist. Mit diesem Wissen, wissen wir, was schon gematched worden ist und können an der richtigen Stelle im Muster unsere Suche fortstetzen. Somit müssen wir wie in der naiven Suche nicht mit der Text-Position permanent zurückgehen
KMP Matching Table functional code
int[] computeLspTable(String pattern) **{** int[] lsp = **new** int[pattern.length()]; lsp[0] = 0; // Base case **for** (**int** i = 1; i \< pattern.length(); i++) **{**// Start by assuming we're extending the previous LSP // the previous lsp tells us how many characters have been matched up to the last position in the pattern [i-1] // this is also the position of the character where a potential next match within the pattern // could start **int** j = lsp[i - 1]; **while** (j \> 0 && pattern.charAt(i) != pattern.charAt(j)) j = lsp[j - 1]; **if** (pattern.charAt(i) == pattern.charAt(j)) j++; lsp[i] = j; **}****return**lsp;**}**
Naive Search Compact Code
**def** **NaiveStringMatch**(s, p): m = len(p) n = len(s) i = **0** j = **0****if**(n \< m):**return**False**while**(i \< n):**if**(p[j] == s[i]): j = j +**1**i = i +**1****else**: # we substract the number of matched characters and move to the next# index in the string i = i - j + **1** j = **0****if**(j == (len(p) -**1**)):**return**True**return** False
KMP Search Algorithm
**int** search(String pattern, String text) **{** int[] lsp = computeLspTable(pattern); **int** j = 0; // Number of chars matched in pattern **for** (**int** i = 0; i \< text.length(); i++) **{****while**(j \> 0 && text.charAt(i) != pattern.charAt(j))**{**// Fall back in the pattern j = lsp[j - 1]; // Strictly decreasing**}****if** (text.charAt(i) == pattern.charAt(j)) **{**// Next char matched, increment position j++; **if** (j == pattern.length()) **return** i - (j - 1); **}****}****return** -1; // Not found **}**
Was sind mögliche Präfixes und Suffixes im Sinne vom KMP Algorithmus und was ist die Idee dahinter?
- Substrings bis zu einer gewissen Position mit Left(pattern, j) und Right(pattern, j) miteinander matchen.
- Somit können wir rogue mit folgendem Algorithmus die Matching-Tabelle bauen
- for (int k = 1; k <= j; j++)
- präfix := Left(pattern, k)
- suffix := Right(pattern, k)
- for (int k = 1; k <= j; j++)
- Diese Information wird benutzt, um Wiederholungen innerhalb von Muster zu erkennen, um zu verhindern, dass bei jedem Mismatch die String-Suche an der Stelle i-j+1 anfangen muss
Idea Boyer-Moore matching Algorithm
- we match from right to left
- we start with i set to the length of the pattern
- with j we move back in the pattern
- if we find a mismatch we align the pattern to the text
- we are at position j in the pattern and position i in the text
- we search text[i] in the pattern starting from position j backwards
- this is our k
- if we find a mismatch we align the pattern to the text
Was markiert j = lsp[i-1] an der Position i?
- When we have a mismatch at j
- We know that characters pat[0..j-1] match with txt[i-j+1…i-1] (Note that j starts with 0 and increment it only when there is a match).
- We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
- From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.
Was ist die Idee hinter dem Nearest Neighbour Algorithmus?
- ausgehend von einer sortierten Liste
- Liste wird in zweier Pakete (rekursiv) ähnlich wie bei MergeSort gesplittet
- Die Distanz wird in den zweier Listen bestimmt
- Am Ende ergibt sich dann die finale Distanz aus dem Minimum
- Distance(A1), Distance(A2), Max(A1) - Min(A2)
- das ist dafür da, um die Distanz der Nachbarn zu bestimmen
Nearest Neighbor functional Code
Eingabe: Sortiertes Feld natürlicher Zahlen
Ausgabe: Minimaler Abstand r
NearestNeighbor(A)
- Falls A.length() = 2 gib A[1] - A[0] zurück
- Wähle pivot := A.length() / 2
- (split)
- A1 = [A[n] | n < pivot]
- A2 = [A[n] | n >= pivot]
- Rufe Nearest Neighbor rekursiv mit den beiden Listen auf
- r1 = NN(A1)
- r2 = NN(A2)
- Merge die Ergebnisse
- Gib Min(r1, r2, A2.Min() - A1.Max())
Wie kann ich eine Liste rekursiv in Teillisten splitten?
- SplitList(List<int> a, List<list>> output
</list></int><ul>
<li>if (a.Length <= 2)
<ul>
<li>output.Add(a)</li>
<li>return</li>
</ul>
</li>
<li>int pivot = a.Length / 2</li>
<li>var left = [A[n] | n < pivot]</li>
<li>var right = [A[n] | n >= pivot]</li>
<li>SplitList(left, output)</li>
<li>SplitList(right, output)</li>
</ul></list></int>