Lecture 06 Flashcards
High-throughout sequencing
a. Resequencing
- Discuss resequencing(2)
- How high must coverage be?(1)
- What happens during cancer genomics(1)
High-throughout sequencing
a. Resequencing
- once a reference genome is available, then new genomes are not assembled de novo but mapped to an existing one
- highly repetitive regions pose challenges during mapping
- coverage must be high enough so that error rate is less than frequency of
natural variation (e.g., SNPs)
- cancer genomics: sequence healthy control cells from same patient
High-throughout sequencing
Exome sequencing
a. What is it?(1)
b. What does it do?(1)
c. How any exons in human genome?(1)
High-throughout sequencing
Exome sequencing
a. resequencing project that sequences only exons/coding regions
b. identify trait-linked mutations in protein-coding regions
c. 180K exons in human genome: ~30 Mb or 1% of entire genome
What’s in a genome? (List the five main features)(5)
a. Protein-coding genes (in human genome)
b. Non-protein-coding regions – encode RNA molecules
c. Pseudogenes – degenerate genes that have mutated so far from original sequences that the encoded proteins are non-functional
d. Binding sites for ligands that regulate gene expression (e.g. promoters)
e. Repetitive elements of unknown functions – see Box 1.12
Discuss protein coding genes
a. What do they follow?(1)
b. How many are there?(1)
c. What is the incidence of genes in genome?(2)
d. What is the gene structure?(2)
e. What do they occupy?(1)
f. How are they distributed?(1)
g. How do they appear?(1) e.g(1)
h. How are unrelated genes separated?92)
i. What is gene transcription under control of?
j. How do they occur?(1)
Protein-coding genes (in human genome)
a. central dogma: DNA mRNA protein
b. ~23,000 such genes
c. incidence of genes in genome
- gene-poor regions: subtelomeric areas on all chr’s; chr’s 18 and X(evo)
- gene-rich regions: chr’s 19 and 22
d. gene structure
- exons interrupted by introns; ave. exon length = 200 bp; intron length differ to result in gene size differences (e.g. insulin = 1.7 kb, LDL receptor = 5.45 kb and dystrophin = 2,400 kb; Titin)
- splice signal sites delineate intron-exon junctions
e. occupy a small fraction of the human genome – no more than about 2–3% of
the overall sequence
f.distributed unevenly across all chr’s; appear on both strands
g. many appear in multiple copies, either identical or diverged into families
- e.g., over 400 functional related olfactory-receptor genes in humans
h. unrelated genes are fairly well separated
- some, however, do overlap; for example, entire genes may appear on the –ve strand, within an intron of another gene
i. gene transcription may be under control of cis (upstream or downstream) and trans regulatory elements (elsewhere in genome/diff chr’s)
j. often, closely-related genes occur in same area
- identical copies may still occur on different chr’s (e.g. ubiquitin)
- evolution – gene duplication + divergence; further duplication
Discuss the relation of the genome sequence and proteome
- Ideally what happens?
a. What does alternative splicing result in? discuss(3)
b. What does RNA editing produce?(3)
c. What post-transcriptional modifications occur?
d. Where do special combinatorial dna splicing occur?
1.. Ideally, genome sequence -> proteome; however, there is variation to genome-proteome relationship (see Box 1.11)
a. alternative splicing – mature mRNA is formed from diff. combinations of exons, but always in the order of appearance
- . affects 95% of multi-exon protein-coding genes in human genome
- . genes with multiple promoters – if reading frames are ‘out-of-phase’ then different proteins
b. RNA editing – produce 1/+ proteins with diff. amino acid sequences that
differ from what is predicted in genome
-. Vitis vinifera – mitochondrial proteins have C U editing
- humans – some genes experience A I change; tissue-specific
c. post-translational modifications – complexes of polypeptide chains (e.g. Hb)
d. special combinatorial DNA splicing – e.g. antibodies
1.What do non-protein-coding regions encode?(1)
Discuss what you need to know about these regions(3)
- Non-protein-coding regions – encode RNA molecules
a. except mRNA, there’s also tRNAs, rRNAs, miRNAs, siRNAs and piRNAs
b. about 3,000 genes encoding the ‘RNA-ome’ – thus, excl. mRNA
c. most control gene expression (e.g., miRNA and siRNA)
- What are pseudogenes?(1)
- Discuss what you need to know about pseudogenes(3)
1.Pseudogenes – degenerate genes that have mutated so far from
original sequences that the encoded proteins are non-functional
a. processed pseudogenes – picked up by viruses from mRNA and reverse
transcribed
b. lack introns and promoters; at times, they are transcribed and play regulatory roles by competing with miRNAs for mRNAs
c. some retain function – rescued by translational read-through of stop-codon
Discuss repetitive elements of unknown functions(3)
Repetitive elements of unknown functions – see Box 1.12
a. large fraction of genome; LINES (21%) and SINES (13%); minisatellites and
microsatellites (collectively 15%)