4 - Multiple Sequence Alignment and profiles Flashcards
BLASTP and BLASTX filter out low complexity regions with the program SEG. Why?
- Replaces them with X to prevent them from making the matches look better than they really are.
If you perform permutation tests and you get an distribution of scores way below the real score, is this alignment homologous?
Yes
Permutation are more sensitive in general and are good for very distant relationships
What is the twilight zone of evolutionary distance?
Around 15-23% identity
List the rules of thumb for the following alignments
A) Sequence > 100 AA, 25% identical
B) Sequence > 100 AA, 15-25% identity
C) <15% identity
A) Sequence > 100 AA, 25% identical
- Probably significantly homologous
B) Sequence > 100 AA, 15-25% identity
- Probably homologous, but need rigorous testing (including permutation tests)
C) s not significant, look for motifs in multiple alignments as well as tertiary structure
List two benefits to multiple alignments
- Work better than pairwise alignment methods for detecting distant sequence relationships
- Pre-requisite for estimating phylogenetic trees
Describe progressive multiple alignments
Eg. clustal
- A heuristic method, and therefore not guaranteed to find the optimal alignment
- Requires n choose 2 pairwise alignments as a starting point
Pairwise alignments: n!/2(n-2)!
Give the steps of ClustalW
- Pairwise alignment to calculate distance matrix (distance between all pairs of sequences)
- Neighbour joining tree
- Aligns two most closely related pair using NW
- Choose next most similar sequence or set of sequences according to the guide tree
- The alignment is build up with each step being treated as a pairwise alignment, sometimes with each member of a pair having more than one sequence
Give 1 advantage and 3 disadvantages of ClustalW
Pros
- Fast
Cons
- No objective function (optimality criterion)
- No way of quantifying whether or not the alignment is good
- Local minimum problem, if an error is introduced early, it is impossible to correct it later in the procedure.
How are sequences weighted with ClustalW?
- Calculated from guide tree
- Weights are normalized, so that the largest weight is 1
- Closely related sequences have a large amount of the same information, so they are downweighted
- These weights are used as simple multiplication factors when deriving the score of an alignment of groups or pairs
Weights allow you to take advantage of similar sequences when you already know the phylogeny or other information that is relevant to weighting.
How does clustal deal with penalties?
These are gap opening penalties and gap extension penalties.
These can be set by the user, but clustal will attempt to manipulate these according to the following criteria:
- Dependence on the site properties
- Dependence on the similarity of the sequences
The percent identity of the sequences is used as a scaling factor to increase the GOP for closely-related sequences and decrease it for more distantly-related sequences
Describe Clustal’s position-specific gap penalities and its reactions to gaps already present at a position
- Before any pair of sequences are aligned, a table of GOPs are generated for each position in the two sets of sequences
- The GOP is manipulated in a position specific manner, so that it can vary over the sequences
If there are already gaps at a position, the GOP is reduced in proportion to the number of sequences with a gap at this position and GEP is lowered by half.
Near gaps (within 8 residues) have an increased GOP
These rules discourage the opening of too many gaps close together but encourage them to exactly line up
Describe clustal’s treatment of gaps in protein loops
- A run of hydrophilic (at least 5) residues has a decreased GOP because these runs usually indicate loop regions in protein structures
- Any position with no gaps that are spanned by 5 hydrophilic residues have the GOP lowered by 3x
Why is it better to delay the alignment of divergent sequences when making multiple alignments?
The most divergent sequences are usually the most difficult to align.
The user has a choice of setting an identity cutoff to delay the alignment until the others have been aligned
What are the two major changes of clustal omega?
- Faster distance matrix calculating method
- Incorporates a Hidden Markov Model into the main alignment engine
Why should the output of a multiple alignment algorithm always be checked?
- Obvious mistakes can be made
- Some sequences will ruin the alignment because they are too divergent
- For phylogenetic inference, you should become familiar with a manual alignment editor.