Lecture 4 - String Difference Flashcards
Applications of string distance?
- biology (DNA and protein sequences)
- file comparison (diff on Unix)
- spelling correction, speech recognition
What is string distance?
What is the smallest number of basic operations needed to transform s to t?
What are the 3 basic operations in transforming strings?
insert
delete
substitute
Is it possible to have a model with costs allocated to operations?
Yes, our example in the lectures focuses on UNIT MODEL.
What is the unit model for string distance?
Where each of the basic 3 operations costs 1 unit.
What do string comparison algorithms use?
dynamic programming
How is dynamic programming used in string transformations?
We build up a table of solutions to sub problems that get bigger and bigger. This is caleld the TABULAR METHOD. Eventually one of the values is the optimal answer.
What is string distance?
Amount of operations one has to do to transfrom a string t to string s.
What is string transformation?
Actual process of changing string t to string s.
Our dynamic programming method for string transformation and distance is what …
edit distance
What are the 4 cases for string transformations?
- match
- mismatch replace
- mistmatch delete
- mismatch insert
In the tabular method what is insert?
left element i.e. d(i, j-1)
In the tabular method what is delete
element on top i.e. d(i-1 , j)
In the tabular method what is replace?
diagonal element i.e. d(i-1 , j-1)
Why do we use the tabular method?
As it allows overlapping problems to be skipped, henceforth making the program more efficient
What is the space complexity of string distance / transformation?
O(mn)
How can we reduce the space complexity of the tabular method.
2 * string t length, as we only require the row above the current row, as well as the current row itself. This is O(n)
What is the complexity of edit distance algorithm with the tabular method.
O(mn)
What row / column can be filled in even before running the algorithm in the tabular method?
row 0 and column 0, simply replaced by the number of the square
d(i , 0) = i
d(0 , j) = j
* essentially index
What is a vertical step in traceback?
substitution or match
What is a horizontal step in traceback?
insert
What is a vertical step in traceback?
delete
Is traceback unique?
No, it can have many optimal alignments