Algorithms Flashcards
Global Alignment
A global alignment aligns two sequences from beginning to end, aligning each letter in each sequence only once. An alignment is produced, regardless of whether or not there is similarity between the sequences. A global alignment should only be used on sequences that share significant similarity over most of their extents, and then it will sometimes return a better presentation. Similarity(X,Y): For i = 0,...,m: SIM[i,O] = i*g For j = 1,...,n: SIM[0,j] = j*g For i = 1,...,m: For j = 1,...,n: SIM[i,j] = max( SIM[i-1,j-1] + s(X[i],Y[j]), SIM[i-1,j]+g, SIMI[i,j-1]+g ) EndFor EndFor Return SIM[m,n]
Local Alignment
Local alignment can be used to align two sequences, but will only align those portions of the sequences that share similarity. If there is no similarity, no alignment will be returned. Local alignments algorithms (such as BLAST) are most often used. Local Similarity(X,Y): S=0 For i=0,...,.m: SIM[i,O] = i*g For j =1,...,n: SIM[0,j] = j*g For i=1,...,m: For j = 1,...,n: SIM[i,j] = max( 0, SIM[i-1,j-1] + s(X[i],Y[j]), SIM[i-1,j]+g, SIM[i,j-1]+g) S=max(S,SIM[i,j]) EndFor EndFor Return S
Suffix-Prefix Trie algorithm
The suffix-prefix trie algorithm is composed of two fases:
First Phase: Building the Prefix Trie
Every node is characterized by:
The range of rows in the suffix-array it represents;
The children it has, and its ancestor.
The root node is the only node without ancestors, and represents every row of the matrix. (i.e. range = [0, n], where n = length of the reference string)
The trie is built in such a way that it contains in its paths from root to leaves all suffixes of the reference string, in reverse order, such that the leaves of the trie are always the first character of the suffix their path represents.
Second Phase: Querying the Prefix Trie for matches
For the exact matching we just query the Trie by following the path (if it exist) of the query string
For inexact matching of order k we perform the operation described above, but for every possible string derived from the query string having k mismatches.
The suffix-prefix trie is faster than other alignment techniques because local alignment and global alignment have both quadratic running times, while querying a prefix trie has linear time cost.
The suffix-prefix trie can also be constructed one time and reused.
The downside of the prefix-trie algorithm is that it needs to be built, and this comes at a cost that is super-linear in both time and memory, but it can be amortised overtime since it can be reused.
BLAST
Blast is a hashed seed-extend algorithm, primarily designed to identify homologous sequences
BLAST first finds highly conserved or identical sequences which are then extended with a local alignment, this results in faster alignment wrt local alignment over a whole sequence.