3. Pairwise alignment Flashcards
alignment score
is used to measure the quality of the alignment
dot plot
intrasequence comparison
more than 40% identity
homology!
20-40% identity
homology probable
below 20% identity
homology possible but unlikely
sliding window approach
- remove noise
- compare multiple residues at the same time
- only a dot if nmatcg within wsize
best scoring alignment
optimal alignment
next best scoring alignment
suboptimal alignment
the optimal alignment is not necessarily
correct
scoring systems (3)
- theoretical, count nr of mutations
- physiochemical properties
- based on evidence from evolution
Substitution matrices
Eg PAM and Blosum
PAM stands for
point accepted mutation
PAM
- based on observed aa substitution frequencies
- logP(aa1->aa2)
- several matrices, number represent how many accepted mutations
- 1 PAM - 1 accepted aa change/100 residues
Blosum
- derived from multiple local alignments
- should reflecr evolutionary events if alignment is correct
- eg BLOSUM 62 is a matrix calculated from comparisons of sequences with no more than 62% identical
PAM-
Blosum-
evolutionary distance
sequence similarity
choice of matrix depend on the situation
- distlanty related
high PAM nr (eg 250)
Low BLOSUM nr (50)
choice of matrix depend on the situation
- closely related
low PAM nr (120)
high Blosum nr (eg 80)
choice of matrix depend on the situation
- short sequences
low low PAM (40) high Blosum (80)
Global
looking at entire sequence
should be fairly similar in length
Local
looking only at a part of sequence, eg domains known to be similar
Gap penalties
penalised when adding gap to obtain optimal alignment
Different gap penalties
Gap opening penalty - often higher, when introducing a gap
Gap extension penalty - easier to extend a gap rather than make a new
Different gap in different settings
high for closely related
low for distantly related
dynamic programming
algorithm for calculating optimal alignment
Needleman-wunch
global alignment
Smith-waterman
local alignment
key difference local vs global
negatives number will be 0
start traceback from highest number
in local what can you do to find suboptimal alignment
put the optimal alignment as 0 and recalculate