Bioinformatics 8: advanced searching and multiple alignment Flashcards

Question 1

Q

Why might you want to filter query sequences?

Answer

A

Statistical models of alignment assume that all matching residues are of equal significance

But this is not the case i.e Poly-A etc. (low complexity) , short period repeats, generic protein secondary structures (coiled coils)

Essential in repeat rich genomes e.g. Human (45% repeating)

Question 2

Q

How could you filter query sequences?

Answer

A

use a ‘masked’ query sequence (less meaningful regions marked with null character)

Via filtering/masking programs

Question 3

Q

% identity which could be real or could be noise (as suggested by good friend Doolittle in 1981)

Answer

A

18-25% (Twilight zone)

Question 4

Q

Explain iterative searching (e.g. in BLAST) and how it identifies distantly related sequences

Answer

A

Protein A (query) and Protein C (Database) may be distantly related, but not detected by BLAST

A 3rd Protein B is initially detected in the database using Protein A query

Protein C is then detected by using Protein B as a query: an iteration

-> PSI (Position Specific Iterative) BLAST most widely used

Question 5

Q

Problems with iterative searching and provided solutions?

Answer

A

1) Number of BLAST searches significantly increases with each iteration
2) Erroneous results in first iteration can bias results

Solutions

1) Sequence profile stores existing matched sequences -> iterate until no new matches found
2) “triage” of sequences after first iteration required

Question 6

Q

What is a PHI-BLAST?

Answer

A

Pattern Hit Initiated BLAST

an extension to PSI-BLAST using a pattern (e.g. insulin family motif) to start a search

Question 7

Q

Applications of MSA (Multiple sequence alignment)?

Answer

A

Finding new related sequences

Genome sequence assembly

Phylogeny (highly conserved sequences can help establish evolutionary tree)

Protein structure predicition (conserved domains, motifs etc.)

Question 8

Q

Purpose of progressive alignment? Overview?

Answer

A

As MSA is very computationally demanding due to scale , progressive alignment used to be faster yet still effective

Related sequences are progressively aligned by clustering (e.g. by programs like clustal) creating a ‘guide’ tree

-> sequences progressively aligned using this guide matrix

(8 cards)