Sequence Alignment Flashcards
Sequence alignment problem
2 sequences, finding the best alignment
can be gaps and mismatch
-> minimize these
Global alignment: used with similar sequences/genes. Align the whole given sequence, every character of best sequence is present
Local alignment: used with dissimilar genes/sequence to find similar regions. Best alignment of subsequences (can be found by doing global alignment of subsequences)
=> scoring function to determine what is the best alignment
Optimal alignment maximize the score !
optimal alignment A* and optimal score M(A*)
Scoring function
f(xi,yi) x and y are the symbols at i position
score of the alignment is the sum of this scoring function at each position. Score is denoted M
scoring function: eg 1 match, -1 mismatch or gap
Can be represented in a substitution matrix: rows and columns nucleotides corresponding index the score
Needleman-Wunsh Algorithm
GLOBAL ALIGNMENT
Find optimal score given a scoring function in a feasible time
Computational cost proportional to the length of both sequences.
Use recursion
Do a matrix
Have one sequence as a column and one sequence as a row.
Score start at 0. Score can be negative
Going horizontally or vertically is adding a gap to the other sequence. Update score accordingly: further down or right more and more score deducted.
Going Diagonally is a math or a mismathc: update score accordingly.
Bottom right is best score: trace back to find the alignment that leads to it.
With arrows indicate the path that lead to the best score !
Smith Waterman algorithm
LOCAL ALIGNMENT
ignore badly aligning regions. Adaptation of Needleman-Wunsh
Build the matrix, find the best score in it: trace back to 0. This is the best local alignment !
Do a matrix
Have one sequence as a column and one sequence as a row.
Score start at 0. Score cannot be negative !!
Going horizontally or vertically is adding a gap to the other sequence. Update score accordingly: further down or right more and more score deducted.
Going Diagonally is a math or a mismathc: update score accordingly.
Check alignment
random test based on properties of the second sequence (permutations or random generation), check optimal sequence, do histogram of scores and with p-value see if the alignment found is statistically significant or not
BLAST
Basic Local Alignment Search Tool
dictionary of all the words, local alignment with match in the dictionary.
Not guarenteed to have an optimal alignment but faster than Smith Waterman
Indexing Based Local Alignment
Dictionnary of All words of length k, alignment are initiated when have score above S. Alignment extended (on left and right) ungapped until score is under statistical treshold.
Allow gaps: add a search area around in the matrix
Outputs only alignements above statistical treshold
Sensitivity speed tradeoff
Using longer words makes it faster but if too long won’t find any match
to improve -> inexact words
or using patterns