Ch 2.2 Flashcards
Annotation
identification of functionally important sections of a sequenced genome. Coding genes, Noncoding parts (e.g. regulatory elements)
How to annotate a genome
What to look for?
– Find ORFs (open reading frames) start and stop codon
—Binding sites: TATA and CAAT boxes in promoters
–Other motifs (amino acid sequence patterns of possible functional significance – e.g. transmembrane)
–Similarity to a known gene in another organism (species)
–Alignment to expressed sequences (cDNA, EST)
–Codon bias
Is it a TATA box?
For any 15bp of a newly sequenced genome:
1)What is the probability of having an A in position 1? If the genome-wide average AT content is 56% Pb(A)= 0.56 * 0.5 = 0.28 But if the 15bp sequence is part of a TATA box region Pb(A) = 61/389 = 0.16
Alignment algorithms
rely on a score system. Matches are given a value score and gaps are penalized: 1)Place the two sequences in a table with one defining columns and the other defining rows. 2)Place a score (1) wherever the two sequences are identical 3)Join the scores with a line. Move
BLAST
alignment searches, Retrieve sequences in a database similar to a query. Sacrifices in alignment quality (scores) for speed and ability to retrieve as many matches as possible. not good tool to do alignments to compare sequences
ESTs (Expressed Sequenced Tags):
For each cloned cDNA, sequence a short segment from the 5’ or 3’ end.
Mini Virus
larger than most viruses, has both DNA and RNA (instead of one or another), can synthesis its own protein, eventually became intracellular parasite? 911 protein-coding genes
Why possible high G-C count?
high G-C content in higher temp because harder to break since temp tends to denature DNA cyanobacteria exposed to UV
Is there a departure from a random use of each nucleotide (25%) in genomes?
GC content: In bacteria varies between 30% and 75%. [Environmental challenges, high temp and UV], Mitochondrial and chloroplast genomes are A+T rich [mutational bias vs. adaptations to intracellular life-style], Vertebrate genomes are 5’-CG-3’ deficient [Methylation of C in 5’CG3’ dinucleotides elevates the mutation rate of C to T by deamination.]
Why a nucleotide bias for A in influenza? 3
Either 1) A need for certain amino acid composition in the protein 2) Translation efficiency (higher % of certain tRNAs in the cell)? 3) Mutation bias
Nuc Bias: aa composition
A need for certain amino acid composition in the protein: Out of 332 aa: Asn 9.25%, Lys 6.27%, no aa bias
Nuc Bias: Translation efficiency
Translation efficiency (higher % of certain tRNAs in the cell)? : Relative Synonymous codon Usage (RSCU): Note the excess of A ending synonymous codons
Nuc Bias: mutation bias
Mutation bias: If so T (or U) ending synonymous codons should be equally frequent. (A region (the D-loop) of most mammalian mitochondrial genomes is affected by mutations bias) mutation should effect both strands b/c it can’t differentiate gene sequence from coding stand
What are salient characteristics of eukaryote genomes?
Intergenic space between coding genes; Duplications (duplicates usually provide opportunities to acquire similar or new functions.); Specialization driven by regulation rather than more genes; You can become highly specialized by having more genes or by having more ways to transcribe the (highly regulated transcription, high number of transcription factors) Proteome complexity and cell specialization requires careful regulation of gene expression.
Human complexity is not in the numbers, explain.
Humans have ~21,000 genes, less than two-times larger than fruit flies. The human genome is 27 times larger than fruit flies. About 60% of all human genes produce more than one mRNA due to alternative splicing. So alternative splicing appears to be another source of addition to our complexity through regulation. Only 1% of our genes have not been found in other species.