EBI MSA Flashcards
Using EBI for MSA
- Copy and paste the sequences (in FASTA format) into the input box
- Modify parameters if necessary
- Run MSA by clicking “submit”
Interpret
o For most MSA software,
“*” indicates identical residues in all input sequences,
“:” indicates highly conserved residues
“.” indicates somewhat conserved residues
2 approaches of aligning multiple sequences
a) Multi-dimensional dynamic programming is an extended version of pairwise dynamic programming. Figure X shows an example of the “Manhattan grid” of aligning 3 sequences by dynamic programming. Computer uses an algorithm to find the optimal “path” in a 3 dimensional cube and subsequently the optimal alignment of the 3 sequences. Although the strategy guarantees optimal alignment, the time required to align many long sequences is very very long hence rendering this strategy impractical for most circumstances.
b) Progressive alignment is a heuristic approach which does not guarantee optimal alignment but the process can be completed within meaningful time. The software called “Clustal W” is an example of such strategy. There are 4 steps in aligning multiple sequences by Clustal W:
i) Pairwise alignment of all possible combinations
ii) Construction of identity matrix and guide tree
iii) Progressively align all sequences starting from the most similar pair
iv) Generation of phylogenetic tree, patterns and profiles from final alignment of all sequences
Limitations of progressive alignment
- Accuracy of alignment depends on good initial alignment. Errors in initial alignment will be carried forward to sequences subsequently added to the alignment. One of the solutions is to re-iterate the initial alignment.
- Sequences with tandem repeats may not align properly.
Applications of MSA
4 Applications of MSA
- Find similarity among distantly related sequences where similarity is too weak to be identified by pairwise sequence alignment
- Identify genes conserved across organisms
- Find conserved regions (patterns and profiles) among sequences and yield insights on the structural features and functions of the sequences
- Identify mutation or rearrangements
Clustal Omega
Paste the Fasta format of proteins, submit the job