4 - Multiple Sequence Alignment and profiles Flashcards

1
Q

BLASTP and BLASTX filter out low complexity regions with the program SEG. Why?

A
  • Replaces them with X to prevent them from making the matches look better than they really are.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If you perform permutation tests and you get an distribution of scores way below the real score, is this alignment homologous?

A

Yes

Permutation are more sensitive in general and are good for very distant relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the twilight zone of evolutionary distance?

A

Around 15-23% identity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List the rules of thumb for the following alignments

A) Sequence > 100 AA, 25% identical

B) Sequence > 100 AA, 15-25% identity

C) <15% identity

A

A) Sequence > 100 AA, 25% identical
- Probably significantly homologous

B) Sequence > 100 AA, 15-25% identity
- Probably homologous, but need rigorous testing (including permutation tests)

C) s not significant, look for motifs in multiple alignments as well as tertiary structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List two benefits to multiple alignments

A
  • Work better than pairwise alignment methods for detecting distant sequence relationships
  • Pre-requisite for estimating phylogenetic trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe progressive multiple alignments

A

Eg. clustal

  • A heuristic method, and therefore not guaranteed to find the optimal alignment
  • Requires n choose 2 pairwise alignments as a starting point

Pairwise alignments: n!/2(n-2)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give the steps of ClustalW

A
  1. Pairwise alignment to calculate distance matrix (distance between all pairs of sequences)
  2. Neighbour joining tree
  3. Aligns two most closely related pair using NW
  4. Choose next most similar sequence or set of sequences according to the guide tree
  5. The alignment is build up with each step being treated as a pairwise alignment, sometimes with each member of a pair having more than one sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give 1 advantage and 3 disadvantages of ClustalW

A

Pros
- Fast

Cons

  • No objective function (optimality criterion)
  • No way of quantifying whether or not the alignment is good
  • Local minimum problem, if an error is introduced early, it is impossible to correct it later in the procedure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are sequences weighted with ClustalW?

A
  • Calculated from guide tree
  • Weights are normalized, so that the largest weight is 1
  • Closely related sequences have a large amount of the same information, so they are downweighted
  • These weights are used as simple multiplication factors when deriving the score of an alignment of groups or pairs

Weights allow you to take advantage of similar sequences when you already know the phylogeny or other information that is relevant to weighting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does clustal deal with penalties?

A

These are gap opening penalties and gap extension penalties.

These can be set by the user, but clustal will attempt to manipulate these according to the following criteria:

  • Dependence on the site properties
  • Dependence on the similarity of the sequences

The percent identity of the sequences is used as a scaling factor to increase the GOP for closely-related sequences and decrease it for more distantly-related sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Clustal’s position-specific gap penalities and its reactions to gaps already present at a position

A
  • Before any pair of sequences are aligned, a table of GOPs are generated for each position in the two sets of sequences
  • The GOP is manipulated in a position specific manner, so that it can vary over the sequences

If there are already gaps at a position, the GOP is reduced in proportion to the number of sequences with a gap at this position and GEP is lowered by half.

Near gaps (within 8 residues) have an increased GOP

These rules discourage the opening of too many gaps close together but encourage them to exactly line up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe clustal’s treatment of gaps in protein loops

A
  • A run of hydrophilic (at least 5) residues has a decreased GOP because these runs usually indicate loop regions in protein structures
  • Any position with no gaps that are spanned by 5 hydrophilic residues have the GOP lowered by 3x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is it better to delay the alignment of divergent sequences when making multiple alignments?

A

The most divergent sequences are usually the most difficult to align.

The user has a choice of setting an identity cutoff to delay the alignment until the others have been aligned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two major changes of clustal omega?

A
  • Faster distance matrix calculating method

- Incorporates a Hidden Markov Model into the main alignment engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why should the output of a multiple alignment algorithm always be checked?

A
  • Obvious mistakes can be made
  • Some sequences will ruin the alignment because they are too divergent
  • For phylogenetic inference, you should become familiar with a manual alignment editor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does database searching with conserved elements of multiple sequence alignment (motifs or patterns, or profiles) improve sensitivity of database searching?

A

Upweighting important (conserved) sequence elements and downweighting less important (less conserved) sequence features

A query is inherently similar to all sequences in an alignment, but not so similar to any one (less than 40% identity), therefore you need some way of summarizing information from all the sequences in the multiple alignment at once:

  • Profiles
  • PSSMs
  • HMMs
  • Sequence LOGOs etc.