Matching Algorithms Flashcards

1
Q

What is the key idea behind the Boyer-Moore Algorithm?

A

The goal is to skip scanning all the characters. The idea is to start matching the characters from right of the pattern to left of the pattern. You encounter a mismatch between text and pattern at a searchIndex. You go backwards and check where you could have a possible match in the pattern. If there is none then you skip all the characters and shift the searchIndex by the full length of the pattern = m. You encounter a match you shift forward by the length of the pattern and shift backward by the position of the matched character in order to align that character with the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Boyer-Moore functional code

A
**public** **static** bool BoyerMooreMatch(string text, string pattern) **{****if**(pattern.Length \> text.Length)**{****return** **false**; **}** var j = pattern.Length - 1; var i = j; var n = pattern.Length; **while** (i \< text.Length && n \> 0) **{****if**(pattern[j] == text[i])**{**j--; i--; n--;**}****else****{**// we want to align the pattern to the text // after failing to match the character we search for the occurence of the // last character in the pattern matching the text**int**k = SearchCharBackwards(pattern, j, text[i]); // we move forward the whole pattern length and move backward by the position of the found character i += pattern.Length - k - 1; j = pattern.Length - 1; n = pattern.Length;**}****}****return**n == 0;**}**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Was ist die Idee hinter Knuth-Morris-Prat Algorithm?

A

Bei einer naiven Suche muss der Text an jeder neu gescannt werden, sobald ein Mismatch zwischen Muster und Text festgestellt wird.

Bei KMP Algorithmus möchten wir verhindern, dass wir jedes Mal den gesamten Text absuchen müssen. Dafür suchen wir nach Wiederholungen in dem Muster. An jeder Stelle im Muster möchten wir wissen, was die Länge des längsten Präfixes ist, was als Suffix vorhanden ist. Mit diesem Wissen, wissen wir, was schon gematched worden ist und können an der richtigen Stelle im Muster unsere Suche fortstetzen. Somit müssen wir wie in der naiven Suche nicht mit der Text-Position permanent zurückgehen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

KMP Matching Table functional code

A
int[] computeLspTable(String pattern) **{** int[] lsp = **new** int[pattern.length()]; lsp[0] = 0; // Base case **for** (**int** i = 1; i \< pattern.length(); i++) **{**// Start by assuming we're extending the previous LSP // the previous lsp tells us how many characters have been matched up to the last position in the pattern [i-1] // this is also the position of the character where a potential next match within the pattern // could start **int** j = lsp[i - 1]; **while** (j \> 0 && pattern.charAt(i) != pattern.charAt(j)) j = lsp[j - 1]; **if** (pattern.charAt(i) == pattern.charAt(j)) j++; lsp[i] = j; **}****return**lsp;**}**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Naive Search Compact Code

A
**def** **NaiveStringMatch**(s, p): m = len(p) n = len(s) i = **0** j = **0****if**(n \< m):**return**False**while**(i \< n):**if**(p[j] == s[i]): j = j +**1**i = i +**1****else**: # we substract the number of matched characters and move to the next# index in the string i = i - j + **1** j = **0****if**(j == (len(p) -**1**)):**return**True**return** False
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

KMP Search Algorithm

A
**int** search(String pattern, String text) **{** int[] lsp = computeLspTable(pattern); **int** j = 0; // Number of chars matched in pattern **for** (**int** i = 0; i \< text.length(); i++) **{****while**(j \> 0 && text.charAt(i) != pattern.charAt(j))**{**// Fall back in the pattern j = lsp[j - 1]; // Strictly decreasing**}****if** (text.charAt(i) == pattern.charAt(j)) **{**// Next char matched, increment position j++; **if** (j == pattern.length()) **return** i - (j - 1); **}****}****return** -1; // Not found **}**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Was sind mögliche Präfixes und Suffixes im Sinne vom KMP Algorithmus und was ist die Idee dahinter?

A
  • Substrings bis zu einer gewissen Position mit Left(pattern, j) und Right(pattern, j) miteinander matchen.
  • Somit können wir rogue mit folgendem Algorithmus die Matching-Tabelle bauen
    • for (int k = 1; k <= j; j++)
      • präfix := Left(pattern, k)
      • suffix := Right(pattern, k)
  • Diese Information wird benutzt, um Wiederholungen innerhalb von Muster zu erkennen, um zu verhindern, dass bei jedem Mismatch die String-Suche an der Stelle i-j+1 anfangen muss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Idea Boyer-Moore matching Algorithm

A
  • we match from right to left
  • we start with i set to the length of the pattern
  • with j we move back in the pattern
    • if we find a mismatch we align the pattern to the text
      • we are at position j in the pattern and position i in the text
      • we search text[i] in the pattern starting from position j backwards
        • this is our k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Was markiert j = lsp[i-1] an der Position i?

A
  • When we have a mismatch at j
    • We know that characters pat[0..j-1] match with txt[i-j+1…i-1] (Note that j starts with 0 and increment it only when there is a match).
    • We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
    • From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Was ist die Idee hinter dem Nearest Neighbour Algorithmus?

A
  • ausgehend von einer sortierten Liste
    • Liste wird in zweier Pakete (rekursiv) ähnlich wie bei MergeSort gesplittet
    • Die Distanz wird in den zweier Listen bestimmt
    • Am Ende ergibt sich dann die finale Distanz aus dem Minimum
      • Distance(A1), Distance(A2), Max(A1) - Min(A2)
      • das ist dafür da, um die Distanz der Nachbarn zu bestimmen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nearest Neighbor functional Code

A

Eingabe: Sortiertes Feld natürlicher Zahlen

Ausgabe: Minimaler Abstand r

NearestNeighbor(A)

  1. Falls A.length() = 2 gib A[1] - A[0] zurück
  2. Wähle pivot := A.length() / 2
  3. (split)
    1. A1 = [A[n] | n < pivot]
    2. A2 = [A[n] | n >= pivot]
  4. Rufe Nearest Neighbor rekursiv mit den beiden Listen auf
    1. r1 = NN(A1)
    2. r2 = NN(A2)
  5. Merge die Ergebnisse
    1. Gib Min(r1, r2, A2.Min() - A1.Max())
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Wie kann ich eine Liste rekursiv in Teillisten splitten?

A
  • SplitList(List<int> a, List<list>&gt; output
    </list></int><ul>
    <li>if (a.Length &lt;= 2)
    <ul>
    <li>output.Add(a)</li>
    <li>return</li>
    </ul>
    </li>
    <li>int pivot = a.Length / 2</li>
    <li>var left = [A[n] | n &lt; pivot]</li>
    <li>var right = [A[n] | n &gt;= pivot]</li>
    <li>SplitList(left, output)</li>
    <li>SplitList(right, output)</li>
    </ul></list></int>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly