Dynamic Programming Flashcards
What Is Dynamic Programming?
Optimizing recursive functions so that previously computed values are stored and do not require recomputing.
What is Memoizing?
Caching the previous computed into an array, check if it was computed already then don’t need to waste time again. Sort of like memorizing
Type of Pattern Matching For Dynamic
It is Approximate Pattern Matching
Resulting in usual O(nm) time being a bit slower and O(n) space
Two types of Algorithms for Approximate Pattern Matching
Global Comparison O(mn)
Local Comparison O(mn)
Why Use a Global Algorithm?
Aligns entire two seq. of roughly equal size,
Why Use Local algorhtm?
Used for less similar strings, suspected to contain similar substring, like spell check
What is the Edit Distance Problem?
ED(X, Y) is the smallest amount of insertions, deletions, and substitutions to transform X to Y.
Ex.
X = GTTACTCGA
Y = GCTTGCCG
G - T T A C T C G A
| | | | | |
G C T T G C - C G -
ED(X, Y) = 4
Input: Two strings
Edit Distance Algorithm
Algorithm EditDistance(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← j
2: for i ← 1 to m do A[i, 0] ← i
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← min(A[i, j], A[i−1, j] + 1)
7: A[i, j] ← min(A[i, j], A[i, j−1] + 1)
8: return A[m, n]
Edit Distance Complexity
Time Complexity O(mn)
Space Complexity O(mn)
- Can improve Space with only storing last two
“it is very unlikely”. a better version made
Edit Distance Explain
You make the big problem into a bunch of subproblems we do using the prefixes of both strings
Only comparing the last letter
Have a 2d Matrix that stores all possible subproblems turning any prefix of X into any prefix of Y. If the last letter same then it is the same ED(X[1..M-1],Y[1..N-1]. Go through the array solve for each sub-problem then return the main problem which is A[m,n].
Also use this lemma for a start:
For every X: ED(X, ε) = ED(ε, X) = |X|,
ED(X[1. .m], Y[1. .n]) = min{
ED(X[1. .m), Y[1. .n)) + δ(X[m], Y[n]),
ED(X[1. .m), Y[1. .n]) + 1,
ED(X[1. .m], Y[1. .n)) + 1
},
δ(X[m], Y[n]), (same letter = 1 else 0)
X[M-1] X[M]
Y[n-1] -> [ REPLACE ] [ INSERT ]
Y[n] -> [ DELETE ] [ HERE]
How to get Alignment from Edit Distance
Alignment:
A B C D E F
A Z C - E D
| |
Start from A[m,n] and traverse to A[0,0], moving only left, up, or upper left in the direction of the minimum value.
L, D, D’, U
D if the value remains the same, D’ if the value changes.
Store the string during the traversal,
then reverse IT.
Meaning of directions:
L: Add new character in X
U: Add new character in Y
D: Keep the same character, no change
D’: Change the character from X to Y
Iterate through string:
L: Character above, - below.
D: Same character above and below. use ‘|’ between
D’: Character above, different character below.
U: - above, character below
A B C D E F 0 1 2 3 4 5 6 A 1 0 1 2 3 4 5 Z 2 1 1 2 3 4 5 C 3 2 2 1 2 3 4 E 4 3 3 2 2 2 3 D 5 4 4 3 2 3 3 A B C D E F 0* 1 2 3 4 5 6 A 1 0* 1 2 3 4 5 Z 2 1 1* 2 3 4 5 C 3 2 2 1* 2* 3 4 E 4 3 3 2 2 2* 3 D 5 4 4 3 2 3 3*
3 -> 2 -> 2 -> 1 -> 1 -> 0 -> 0
D’ D L D D’ D
Reverse:
D D’ D L D D’
Alignment:
A B C D E F
A Z C - E D
| |
Edit Distance With Better Space but Alignment Gone
Only Store last two rows/columns are kept, space can be reduced to O(n) Space
Algorithm EditDistance(X, m, Y, n)
1: b ← 0
2: for j ← 0 to n do A[b, j] ← j
3: for i ← 1 to m do
4: b ← 1 − b
5: A[b, 0] ← i
6: for j ← 1 to n do
7: A[b, j] ← A[1−b, j−1] + δ(X[i], Y[j])
8: A[b, j] ← min(A[b, j], A[1−b, j] + 1)
9: A[b, j] ← min(A[b, j], A[b, j−1] + 1)
10: return A[b, n]
Longest Common Subsequence
Longest Sequence of non-consecutive letters in two strings
EX:
TWO STRINGS
ABCDAF
ACBCF
LONGEST SEQUENCE: ABCF
LCS = 4
Very similar problem to Edit Distance as we have to solve for every prefix of both in a double matrix.
Longest Common Subsequence LEMMA Explain
The Lemma is basically knowing that if the two end letters are the same it is basically LCS of both of the Strings with the last letter removed + 1. If they are not the same then it will be the max LCS of all combo of removing the last letter for both strings. Since the Non-consecutive sequence, the sequence continues even after the last doesn’t match. as we are comparing the whole prefix with the last element
ADD when the last letter is the same, for edit distance we add when they are different
LCS(X[1..m],Y[1..n]) = max {
LCS(X[1..m),Y[1..n)) + f(X[m], Y[n]),
LCS(X[1..m],Y[1..n)),
LCS(X[1..m),Y[1..n])
}
Longest Common Subsequence Algorithm
Algorithm LCS(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← 0
2: for i ← 1 to m do A[i, 0] ← 0
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← max(A[i, j], A[i−1, j])
7: A[i, j] ← max(A[i, j], A[i, j−1])
8: return A[m, n]
Longest Common Subsequence Time Complexity
Time Complexity O(mn)
Space Complexity O(mn)
Longest Common Subsequence Alignment
Start at A[m,n] What you do is only put the letter the result string when the current [row][col] is the max out of [row-1][coll],[row][col-1],[row-1][col-1]. and then move in the direction of the diagonal or [row-1][col-1]
if it is not the max then don’t print just go to the row col that is equal to the current [row][col]. after you traverse all the way to A[0,0] reverse the string then it will be aligned
Longest Common Subsequence Better Space, No Alignment
Complexity with this
Time Complexity: O(mn)
Space: O(n)
Algorithm LCS(X, m, Y, n)
1: b ← 0
2: for j ← 0 to n do A[b, j] ← 0
for i ← 0 to n do A[i, b] ← 0
3: for i ← 1 to m do
4: b ← 1 − b
6: for j ← 1 to n do
7: A[b, j] ← A[1−b, j−1] + δ(X[i], Y[j])
8: A[b, j] ← max(A[b, j], A[1−b, j])
9: A[b, j] ← max(A[b, j], A[b, j−1])
10: return A[b, n]
Approximate Pattern Matching
COMPLEXITY:
TIME: O(mn)
SPACE: O(mn)
Input: pattern and text
Output: Index positions where Edit distance is between some amount
USES EDIT DISTANCE SO ALGO VERY SIMILAR;
Algorithm APRROXPM(X, m, Y, n)
1: for j ← 1 to n do A[0, j] ← 0
2: for i ← 1 to m do A[i, 0] ← i
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← min(A[i, j], A[i−1, j] + 1)
7: A[i, j] ← min(A[i, j], A[i, j−1] + 1)
8:for j ← 1 to n do
9: if A[m, j] ≤ t then output(j)
//WANT TO KNOW ED OF THE FULL PATTERN IN THE TEXT, SO ALREADY A[m, j]
Subset Sum Problem
Input: Array of positive integers and a positive integer T
Output: TRUE FALSE if there exist a subset of the array that its sum is T
EX:
■ Input: X = [4, 2, 7, 4, 9, 2], T = 20
■ Output: TRUE (20 = 9 + 7 + 4)
■ Input: X = [4, 2, 7, 4, 9, 2], T = 40
■ Output: FALSE
■ Input: X = [4, 2, 7, 4, 9, 2], T = 5
■ Output: FALSE
Subset Sum LEMMA
IF T = 0 It will always be TRUE
IF the Array is empty it will always be FALSE
When T> 0 and array_size > 0 then :
SSum(X, n, T) = {
IF X[n] <= T:
SSum(X,n-1,T or SSum(X,n-1,T-X[n]
OTHERWISE: SSum(X,n-1,T) }
- If the last element we are examining is less than or equal to T then we check if we already hit TRUE OR FALSE. IF TRUE we out then it is true for every element after. IF FALSE then this element might be a possibility so we SSum(X, n-1, T- X[n]).
We are breaking the array into smaller pieces to find the piece with the subset sum, starting from last and checking before.
If the X[n] or last element we are examining is greater than T then it not in solution so just check the element before
ex.
X = [2, 3, 7, 8, 10]
T = 11
expected: TRUE (11 = 3 + 8)
Initial setup for S[][]
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T - - - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
First iteration:
Compare X[i] = 2 with 1, 2 is not <= 1, inherit row above
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F - - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
Second iteration:
Compare X[i] = 2 with 2, 2 <= 2, get row above OR j-X[i] in above row.
S[i-1, j] = F
S[i-1, j-X[i]] = S[i-1, j-2] = T
= T
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F T - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
Final iteration:
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F T F F F F F F F F F 3 T F T T F T F F F F F F 7 T F T T F T F T F T T F 8 T F T T F T F T T T T T 10 T F T T F T F T T T T T
Check bottom right value = TRUE
SubSet Sum Complexity
Time Complexity: O(nT)
Space: O(nT) can be O(T) by only storing the last cols
SubSet Sum Algorithm
Algorithm SubsetSum(X, n, T)
1: for i ← 0 to n do S[i, 0] ← TRUE
2: for j ← 1 to T do S[0, j] ← FALSE
3: for i ← 1 to n do
4: for j ← 1 to T do
5: if X[i] ≤ j then
6: S[i, j] ← S[i−1, j] or S[i−1, j−X[i]]
7: else S[i, j] ← S[i−1, j]
8: return S[n, T]
Detailed Solution to the SubSet Sum Algo
To print out the solution to the algorithm
Prints the exact values that make the subset sum
def PrintSubsetSum(X, n, T):
SubsetSum(X, n, T)
if S[n, T] == FALSE: return i = n j = T while j > 0: if S[i - 1, j] == True: i -= 1 else: print(X[i]) j -= X[i] i -= 1
Basically going up the T in the 2d array because once true all the bottoms are true
ex.
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T* F T F F F F F F F F F 3 T F T T* F T F F F F F F 7 T F T T* F T F T F T T F 8 T F T T F T F T T T T T* 10 T F T T F T F T T T T T*
Column jumping happens at X[i] = 8 and 3.
Knapsack Problem
This is a generalized problem of the Subset Sum problem.
Input: two arrays of positive integers, one is weight, and the other is value and nonnegative T
output: the maximal value of input items that does not exceed weight T
The robber trying to maximize his heist
Knapsack LEMMA
solution to Knapsack on W[1 . . n], V[1 . . n], and T as Kn(W, V, n, T).
■ For every W and V: Kn(W, V, n, 0) = 0,
■ For T > 0, it holds: Kn(W, V, 0, T) = 0,
Kn(W, V, n, T) = {
if W[n] ≤ T:
max{
Kn(W, V, n−1, T),
Kn(W, V, n−1, T−W[n]) + V[n])
}
otherwise:
Kn(W, V, n−1, T) .
If the current weight is greater than the target weight limit then we do not do anything and we see if the previous is true or not.
If the current weight is less than the target then we check if the previous index has a solution and we get the max of that value with the value with the current one being checked in the solution set. The max will be the answer because we are trying to optimize. Very close to Subset Sum.
Knapsack Complexity
Time Complexity: O(nT)
Space: O(nT) can make it to T by storing the last two columns
Knapsack Problem Algorithm
Algorithm Knapsack(W, V, n, T)
1: for i ← 0 to n do K[i, 0] ← 0
2: for j ← 1 to T do K[0, j] ← 0
3: for i ← 1 to n do
4: for j ← 1 to T do
5: if W[i] ≤ j then
6: K[i, j] ← max(
K[i−1, j],
K[i−1, j − W[i]] + V[i]
)
7: else K[i, j] ← K[i−1, j]
8: return K[n, T]
Largest Increasing Subsequence (Non-Consecutive) Problem
Input: Array of integers
Output: The largest size of increasing nonconsecutive sequence.
EX: A = [3, 1, 5, 2, 6], then the algorithm should output 3 (corresponding to an example
LIS: [3, 5, 6]).
Largest Increasing Subsequence (Non-Consecutive) Complexity
Time Complexity: O(n^2)
Space: O(n)
Largest Increasing Subsequence (Non-Consecutive) LEMMA
So the lemma is if array contains 1 element then LIS is always 1.
When the array is less than 1 then 0
Denote solution to LIS to any index as LIS(k)
When the Array size is greater than 1
LIS[n] = 1 + max{LIS[k], k < 5; when A[k] < A[n];
The largest LIS of the smallest number before it plus the number you will add to the end is the solution
Largest increasing Subsequence (Non-Consecutive) ALGORITHM
procedure LIS(A[1 . . . n])
LISA[1..n] ← 1
for i ← 2 to n do
for j ← 1 to i do
if A[i] > A[j] and LISA[i] < LISA[j] + 1 then
LISA[i] ← LISA[j] + 1
return max(LISA[1 . . . n])
Optimal BST Problem
FIRST Define BST: Every node has a key, All nodes to the left are smaller and all nodes to the right are greater. Array representation is left to right.
Two arrays:
1. a depth array that contains the depth of each node [i]; d[i], where d[i] is defined as the number of nodes
on the path from the node corresponding to ith smallest key to the root of T
2. Another array contains the amount of time node [i] is accessed called frequency array; f[i]
Input: Given the frequency array f[i]
Output: We want to print the Minimum Cost of traversing through all nodes i f[i] times. So Cost is defined as
Cost(T, f) = Sum from i= 1 to n {d[i] * f[i]}
ex.
Values = [10 11 12]
f = [2 1 2]
BST #1:
10
\
11
\
12
index = [1 2 3]
value = [10 11 12]
d = [1 2 3]
Cost(T, f) = (21) + (12) + (2*3) = 10
BST #2:
11
/ \
10 12
index = [1 2 3]
value = [10 11 12]
d = [2 1 2]
Cost(T, f) = (22) + (11) + (2*2) = 9
BST #2 is more efficient than BST #1.
Optimal BST Complexity
Time Complexity: O(n^3)
Optimal BST LEMMA
the F[i] array is the cumulative sum of the f[i] array up to that point
The F array in your OptimalBST algorithm is used to store the prefix sums of the frequencies of the keys.
The prefix sum is a technique used in algorithms to calculate the sum of elements in a range (i, j) efficiently. Instead of summing up elements each time a sum is needed, prefix sum calculates and stores the sum of elements up to a certain position beforehand.
In your algorithm, F[i] stores the sum of frequencies from f[1] to f[i]. This allows for constant-time retrieval of the sum of frequencies between any two keys, which is used in line 10 of your algorithm. The expression (F[j] - F[i-1]) gives the sum of frequencies from f[i] to f[j], which represents the total search cost for keys between i and j.
This technique significantly improves the efficiency of your algorithm by reducing repeated computation. Instead of recalculating the sum every time it’s needed, you calculate it once and reuse it, which is a common strategy in dynamic programming.
Optimal BST Algo
Algorithm OptimalBST(f[1 . . n])
1: F[1] ← 0
2: for i ← 1 to n do
3: F[i + 1] ← F[i] + f[i]
4: for i ← 1 to n + 1 do A[i, i] ← 0
5: for l ← 1 to n do
6: for i ← 1 to n + 1 − l do
7: j ← i + l
8: A[i, j] ← ∞
9: for r ← i to j − 1 do
10: if A[i, r] + A[r+1, j] < A[i, j] then
11: A[i, j] ← A[i, r] + A[r+1, j]
12: A[i, j] ← A[i, j] + (F[j] − F[i])
13: return A[1, n + 1]
Chain Matrix Multiplication Problem
Trying to multiply n >= 2 amount of matrices together
■ Recall that matrix mul is associative, i.e., for all A, B, C (of right size), (A×B)×C = A×(B ×C),
■ When multiplying m × k and k × p matrices, we perform O(mkp) operations,
When it is n >= 3 you can do the multiplication in a different order so the close of operations can vary
Input: an array S[1..n] which is of integer pairs such that ith matrix is of size S[i].h * S[i].w. We assume for that for every i matrix width the matrix after has its height equal to the ith width. ensures matrix multiplication rule.
Output: Cost of optimal chain matrix multiplication of n matrices with sizes stored in S. We denote this number as OptMul(S).
Can multiply a bunch of different ways since it is associative
Chain Matrix Multiplication LEMMA
■ Let S[1 . . n] be our input,
■ If n = 1 then OptMul(S) = 0,
■ If n ≥ 2, then
Optmul(S) = min from 1-n {
Optmul(S[1..i])+Optmul(S[i+1..n])+S[1].hS[i].wS[n].w