Lecture 18 - Sequence Similarity Flashcards
What is an open reading frame?
Part of the reading frame that has the potential to be translated.
How can the function of unknown proteins be inferred?
By similarity of sequence to known proteins.
What was the first protein database?
Protein Information Resource (PIR)
Give some protein sequence databases.
- Swiss-Prot
- TrEMBL
- Uniprot
Which protein database is manually annotated?
Swiss-Prot
Give three DNA sequence databases in order from early to late.
- Genbank
- European Molecular Biology Laboratory
- DNA Databank of Japan
What is sequence alignment?
A way of arranging the primary sequences of DNA, RNA or protein to identify regions of similarity.
What is pairwise alignment?
Comparing two sequences.
A query sequence is compared to every sequence in a database to find the best match.
What is global alignment?
An attempt to match every residue in two sequences.
When is global alignment most useful?
When sequences are of equal lengths.
When is local alignment more useful?
For dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.
What is local alignment?
Trying to match regions of two sequences.
What alignment method does BLAST use?
Local alignment
What has BLAST been designed for?
Speed
What are the following BLAST programmes for?
a) blastn
b) blastp
c) blastx
d) tblastn
e) tblastx
a) blastn = nucleotide query vs. nucleotide database
b) blastp = protein query vs. protein database
c) blastx = nulecotide query vs. protein database
d) tblastn = protein query vs. nucleotide database
e) tblastx = nucleotide query vs. nucleotide database
What is it best to compare between the species?
Protein sequences, because they evolve more slowly.
Which BLAST programme should you use when mapping mRNA or gene sequences to genomic DNA from the same organism?
blastn
What is a score?
A value calculated from number of matching or similar amino acids in alignment.
What is an expect?
A probability that alignment could happen by change; depends on score, length of query sequence and size of database.
What are identities?
The number of identical amino acids in alignment.
What are positives?
The number of similar amino acids in alignment.
What is a protein family?
A group of evolutionarily-related proteins.
What do members of a protein family share?
Similar 3-dimensional structures, functions and sequence similarity.
What can create gene families?
Gene duplication
What are Cytochrome p450s?
A group of enzymes involved in the oxidative metabolism of a large number of natural compounds.
What kingdoms are Cytochrome P450s found in?
- Animals
- Plants
- Fungi
- Bacteria
Give some functions of cytochrome p450.
- Detoxification of drugs and toxins
- Biosynthesis of steroids, fatty acids
- Biosynthesis of toxins
- Alkane assimilation
What is Magnaporthe oryzae?
Rice blast fungus.
What is Neurospora crassa?
Red bread mould. (lives on dead organic matter)
What are the number of cytochrome p450s in
a) M. oryzae?
b) N. crassa?
a) M. oryzae = 122
b) N. crassa = 37
Why are cytochrome P450s important for pathogens?
Need to detoxify anti-fungal chemicals produced by the host, and to synthesise toxins to help it invade the host.
What are Cytochrome P450s classified into families based on?
Sequence homology
Are amino acid sequences well conserved or not well conserved between cytochrome P450 families?
Not well conserved.
What is Pfam?
Protein family database based on hidden Markov models.
What is HMM?
A statistical model that considers all of the possible combinations of matches, mismatches and gaps to generate an alignment of a set of sequences.
What is a domain?
A segment of polypeptide chain that can fold into a three-dimensional structure irrespective of the presence of other segments of the chain.
What is Myosin V involved in?
Actin-dependent transport of vesicles.
What is TPA?
An extracellular protein that prevents blood clotting.
How many domains does TPA have?
Four, each coded for by a different exon.