Lecture 18 - Sequence Similarity Flashcards
What is an open reading frame?
Part of the reading frame that has the potential to be translated.
How can the function of unknown proteins be inferred?
By similarity of sequence to known proteins.
What was the first protein database?
Protein Information Resource (PIR)
Give some protein sequence databases.
- Swiss-Prot
- TrEMBL
- Uniprot
Which protein database is manually annotated?
Swiss-Prot
Give three DNA sequence databases in order from early to late.
- Genbank
- European Molecular Biology Laboratory
- DNA Databank of Japan
What is sequence alignment?
A way of arranging the primary sequences of DNA, RNA or protein to identify regions of similarity.
What is pairwise alignment?
Comparing two sequences.
A query sequence is compared to every sequence in a database to find the best match.
What is global alignment?
An attempt to match every residue in two sequences.
When is global alignment most useful?
When sequences are of equal lengths.
When is local alignment more useful?
For dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.
What is local alignment?
Trying to match regions of two sequences.
What alignment method does BLAST use?
Local alignment
What has BLAST been designed for?
Speed
What are the following BLAST programmes for?
a) blastn
b) blastp
c) blastx
d) tblastn
e) tblastx
a) blastn = nucleotide query vs. nucleotide database
b) blastp = protein query vs. protein database
c) blastx = nulecotide query vs. protein database
d) tblastn = protein query vs. nucleotide database
e) tblastx = nucleotide query vs. nucleotide database
What is it best to compare between the species?
Protein sequences, because they evolve more slowly.
Which BLAST programme should you use when mapping mRNA or gene sequences to genomic DNA from the same organism?
blastn
What is a score?
A value calculated from number of matching or similar amino acids in alignment.
What is an expect?
A probability that alignment could happen by change; depends on score, length of query sequence and size of database.
What are identities?
The number of identical amino acids in alignment.
What are positives?
The number of similar amino acids in alignment.
What is a protein family?
A group of evolutionarily-related proteins.
What do members of a protein family share?
Similar 3-dimensional structures, functions and sequence similarity.
What can create gene families?
Gene duplication