Dynamic Programming Flashcards
What Is Dynamic Programming?
Optimizing recursive functions so that previously computed values are stored and do not require recomputing.
What is Memoizing?
Caching the previous computed into an array, check if it was computed already then don’t need to waste time again. Sort of like memorizing
Type of Pattern Matching For Dynamic
It is Approximate Pattern Matching
Resulting in usual O(nm) time being a bit slower and O(n) space
Two types of Algorithms for Approximate Pattern Matching
Global Comparison O(mn)
Local Comparison O(mn)
Why Use a Global Algorithm?
Aligns entire two seq. of roughly equal size,
Why Use Local algorhtm?
Used for less similar strings, suspected to contain similar substring, like spell check
What is the Edit Distance Problem?
ED(X, Y) is the smallest amount of insertions, deletions, and substitutions to transform X to Y.
Ex.
X = GTTACTCGA
Y = GCTTGCCG
G - T T A C T C G A
| | | | | |
G C T T G C - C G -
ED(X, Y) = 4
Input: Two strings
Edit Distance Algorithm
Algorithm EditDistance(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← j
2: for i ← 1 to m do A[i, 0] ← i
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← min(A[i, j], A[i−1, j] + 1)
7: A[i, j] ← min(A[i, j], A[i, j−1] + 1)
8: return A[m, n]
Edit Distance Complexity
Time Complexity O(mn)
Space Complexity O(mn)
- Can improve Space with only storing last two
“it is very unlikely”. a better version made
Edit Distance Explain
You make the big problem into a bunch of subproblems we do using the prefixes of both strings
Only comparing the last letter
Have a 2d Matrix that stores all possible subproblems turning any prefix of X into any prefix of Y. If the last letter same then it is the same ED(X[1..M-1],Y[1..N-1]. Go through the array solve for each sub-problem then return the main problem which is A[m,n].
Also use this lemma for a start:
For every X: ED(X, ε) = ED(ε, X) = |X|,
ED(X[1. .m], Y[1. .n]) = min{
ED(X[1. .m), Y[1. .n)) + δ(X[m], Y[n]),
ED(X[1. .m), Y[1. .n]) + 1,
ED(X[1. .m], Y[1. .n)) + 1
},
δ(X[m], Y[n]), (same letter = 1 else 0)
X[M-1] X[M]
Y[n-1] -> [ REPLACE ] [ INSERT ]
Y[n] -> [ DELETE ] [ HERE]
How to get Alignment from Edit Distance
Alignment:
A B C D E F
A Z C - E D
| |
Start from A[m,n] and traverse to A[0,0], moving only left, up, or upper left in the direction of the minimum value.
L, D, D’, U
D if the value remains the same, D’ if the value changes.
Store the string during the traversal,
then reverse IT.
Meaning of directions:
L: Add new character in X
U: Add new character in Y
D: Keep the same character, no change
D’: Change the character from X to Y
Iterate through string:
L: Character above, - below.
D: Same character above and below. use ‘|’ between
D’: Character above, different character below.
U: - above, character below
A B C D E F 0 1 2 3 4 5 6 A 1 0 1 2 3 4 5 Z 2 1 1 2 3 4 5 C 3 2 2 1 2 3 4 E 4 3 3 2 2 2 3 D 5 4 4 3 2 3 3 A B C D E F 0* 1 2 3 4 5 6 A 1 0* 1 2 3 4 5 Z 2 1 1* 2 3 4 5 C 3 2 2 1* 2* 3 4 E 4 3 3 2 2 2* 3 D 5 4 4 3 2 3 3*
3 -> 2 -> 2 -> 1 -> 1 -> 0 -> 0
D’ D L D D’ D
Reverse:
D D’ D L D D’
Alignment:
A B C D E F
A Z C - E D
| |
Edit Distance With Better Space but Alignment Gone
Only Store last two rows/columns are kept, space can be reduced to O(n) Space
Algorithm EditDistance(X, m, Y, n)
1: b ← 0
2: for j ← 0 to n do A[b, j] ← j
3: for i ← 1 to m do
4: b ← 1 − b
5: A[b, 0] ← i
6: for j ← 1 to n do
7: A[b, j] ← A[1−b, j−1] + δ(X[i], Y[j])
8: A[b, j] ← min(A[b, j], A[1−b, j] + 1)
9: A[b, j] ← min(A[b, j], A[b, j−1] + 1)
10: return A[b, n]
Longest Common Subsequence
Longest Sequence of non-consecutive letters in two strings
EX:
TWO STRINGS
ABCDAF
ACBCF
LONGEST SEQUENCE: ABCF
LCS = 4
Very similar problem to Edit Distance as we have to solve for every prefix of both in a double matrix.
Longest Common Subsequence LEMMA Explain
The Lemma is basically knowing that if the two end letters are the same it is basically LCS of both of the Strings with the last letter removed + 1. If they are not the same then it will be the max LCS of all combo of removing the last letter for both strings. Since the Non-consecutive sequence, the sequence continues even after the last doesn’t match. as we are comparing the whole prefix with the last element
ADD when the last letter is the same, for edit distance we add when they are different
LCS(X[1..m],Y[1..n]) = max {
LCS(X[1..m),Y[1..n)) + f(X[m], Y[n]),
LCS(X[1..m],Y[1..n)),
LCS(X[1..m),Y[1..n])
}
Longest Common Subsequence Algorithm
Algorithm LCS(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← 0
2: for i ← 1 to m do A[i, 0] ← 0
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← max(A[i, j], A[i−1, j])
7: A[i, j] ← max(A[i, j], A[i, j−1])
8: return A[m, n]