5. Multiple sequence alignment Flashcards
Consensus sequences
uses more information than pairwise alignment
PROSITE patterns
generating the consensus profile
- > database for motifs and patterns
- > rules for describing patterns
SYNTAX
R
Required residue
SYNTAX
x
any residue
[ ]
multiple residues allowed
{ }
residues not allowed
x(i,j)
residues repeated i-j times
frequency table
tabulate how common a nucleotide/residue is in every position
PSSM abbr
position specific scoring matrices
ppsm
- known residue preference at each alignment position
- pssm values are weighted sums of standard substitution matrices
- to further extend previous search, to find more family members
Logos
- Tall column height - high conservation
- Short column height - low conservation
- letter height is for each residue
Iterated sequence searches
- discover known sequences in family
- not possible to detect all with one sequence
PSI-BLAST
Position-specific iterated BLAST
- BLAST, use E-value cutoff to select sequences
- Iterate until convergence
a. make PSSm
b. search database
Hidden Markov Models
- weather etc
- probability of eg sequence etc.
one of most common MSA
progressive alignment
Progressive alignment
- add one sequence at the time
- order can be crucial
- compare pairwise
- align in order of similarity
Other alignment methods
- iterative methods
- machine learning methods
- phylogeny-based approaches
MSA programs
Clustal Omega
MAFFT
MSA Probs
MAFFT
best program creating MSA
re-does guide tree until it cannot get better
divide into sub alignments and regions