Ch 2.2 Flashcards

Question 1

Q

Annotation

Answer

A

identification of functionally important sections of a sequenced genome. Coding genes, Noncoding parts (e.g. regulatory elements)

Question 2

Q

How to annotate a genome

Answer

A

What to look for?
– Find ORFs (open reading frames) start and stop codon
—Binding sites: TATA and CAAT boxes in promoters
–Other motifs (amino acid sequence patterns of possible functional significance – e.g. transmembrane)
–Similarity to a known gene in another organism (species)
–Alignment to expressed sequences (cDNA, EST)
–Codon bias

Question 3

Q

Is it a TATA box?

Answer

A

For any 15bp of a newly sequenced genome:
1)What is the probability of having an A in position 1? If the genome-wide average AT content is 56% Pb(A)= 0.56 * 0.5 = 0.28 But if the 15bp sequence is part of a TATA box region Pb(A) = 61/389 = 0.16

Question 4

Q

Alignment algorithms

Answer

A

rely on a score system. Matches are given a value score and gaps are penalized: 1)Place the two sequences in a table with one defining columns and the other defining rows. 2)Place a score (1) wherever the two sequences are identical 3)Join the scores with a line. Move

Question 5

Q

BLAST

Answer

A

alignment searches, Retrieve sequences in a database similar to a query. Sacrifices in alignment quality (scores) for speed and ability to retrieve as many matches as possible. not good tool to do alignments to compare sequences

Question 6

Q

ESTs (Expressed Sequenced Tags):

Answer

A

For each cloned cDNA, sequence a short segment from the 5’ or 3’ end.

Question 7

Q

Mini Virus

Answer

A

larger than most viruses, has both DNA and RNA (instead of one or another), can synthesis its own protein, eventually became intracellular parasite? 911 protein-coding genes

Question 8

Q

Why possible high G-C count?

Answer

A

high G-C content in higher temp because harder to break since temp tends to denature DNA cyanobacteria exposed to UV

Question 9

Q

Is there a departure from a random use of each nucleotide (25%) in genomes?

Answer

A

GC content: In bacteria varies between 30% and 75%. [Environmental challenges, high temp and UV], Mitochondrial and chloroplast genomes are A+T rich [mutational bias vs. adaptations to intracellular life-style], Vertebrate genomes are 5’-CG-3’ deficient [Methylation of C in 5’CG3’ dinucleotides elevates the mutation rate of C to T by deamination.]

Question 10

Q

Why a nucleotide bias for A in influenza? 3

Answer

A

Either 1) A need for certain amino acid composition in the protein 2) Translation efficiency (higher % of certain tRNAs in the cell)? 3) Mutation bias

Question 11

Q

Nuc Bias: aa composition

Answer

A

A need for certain amino acid composition in the protein: Out of 332 aa: Asn 9.25%, Lys 6.27%, no aa bias

Question 12

Q

Nuc Bias: Translation efficiency

Answer

A

Translation efficiency (higher % of certain tRNAs in the cell)? : Relative Synonymous codon Usage (RSCU): Note the excess of A ending synonymous codons

Question 13

Q

Nuc Bias: mutation bias

Answer

A

Mutation bias: If so T (or U) ending synonymous codons should be equally frequent. (A region (the D-loop) of most mammalian mitochondrial genomes is affected by mutations bias) mutation should effect both strands b/c it can’t differentiate gene sequence from coding stand

Question 14

Q

What are salient characteristics of eukaryote genomes?

Answer

A

Intergenic space between coding genes; Duplications (duplicates usually provide opportunities to acquire similar or new functions.); Specialization driven by regulation rather than more genes; You can become highly specialized by having more genes or by having more ways to transcribe the (highly regulated transcription, high number of transcription factors) Proteome complexity and cell specialization requires careful regulation of gene expression.

Question 15

Q

Human complexity is not in the numbers, explain.

Answer

A

Humans have ~21,000 genes, less than two-times larger than fruit flies. The human genome is 27 times larger than fruit flies. About 60% of all human genes produce more than one mRNA due to alternative splicing. So alternative splicing appears to be another source of addition to our complexity through regulation. Only 1% of our genes have not been found in other species.