Week 3 Flashcards
Genome
all DNA and identification of all DNA elements
Transcriptome
All transcripts expressed (list plus analysis of expression)
Proteome
all proteins expressed (list plus analysis of modification)
Bacterial Genome to bacterial proteome
Large scale ORF; Looking for open reading frames in the bacteria.
Take the protein files and build a database which we would call the proteome of the bacteria.
Simple for bacteria because of the fact that DNA contains the coding region that is not interrupted.
So you can go from DNA to the protein coding capacity of that DNA very simply.
Bacterial Genome to bacterial proteome
Large scale ORF; Looking for open reading frames in the bacteria
Eukaryotic genome to eularyotic proteome
Large scale ORF finder we can’t go directly from the genome to the proteome.
Because of splicing.
we need the transcriptome to get to the proteome
What is the Transcriptome?
All expressed RNA
major problem with annotating the genome particularly determining the proteome
A major problem with annotating the genome particularly determining the proteome (total coding capacity of the genome), is the modification that RNA undergoes in eukaryotes
Eukaryotic mRNA
Extensively processed
5’ prime cap (sevenmethylguanine, sugar, 5’ to 5’ triphosphate)
AUG first ORF codon.
Presence of the poly-A-tail helps us annotate the proteome.
Reverse transcriptase
The DNA copy is made reverse transcriptase which requires a DNA primer.
Use an oligo dT primer that hybridizes with the poly-A-tail.
Therefore the total transcriptome is not represented.
Before nanopore this was the only way to sequence rna.
Proteome comes form
Translation of mRNA.
Another processing event that causes a large amount of problems with annotation is
RNA polymerase transcribes DNA into the primary RNA transcript, that is not the translated transcript.
Intronic sequences removed through splicing.
Those intronic sequences are removed and we have the production of a mature messenger RNA
cDNA is simply a
complementary copy of the mature mRNA after the intronic sequences are removed.
A large amount of the genome is not expressed
intragenic regions which are not transcribed and intronic regions that are transcribed and spliced out.
Spliced cDNA sequence
A DNA sequence of the expressed sequence that sequence will align to the genomic sequence in a broken pattern.
determine what the cDNA sequence encodes to send this information to the proteome databse.
we can’t easily get this information from the genomic sequence because the sequence is encode within these three exons and is seperated from one another.
Alternative Splicing
Genes undergo alternative splicing.
When you align different cDNA sequences to the genome you find that some genes that these alignment are quite different from one cDNA to another.
Indicating that they came from transciripts that have undergone alternative splicing
This gene produces six distinct messenger rna transcripts.
That encode three distinct polypeptides.
When you align this sequence to drosophola DNA you ifnd six different patterns of alignments due to six different splicing patterns of the mRNA transcripts.
Alternative splicing benefit
increases the number of proteins that can be encoded by a single gene.
one gene that produces one premessenger RNA that can be spliced in multiple ways, all three exons are included in the mRNA, but one of 12 exon 4 can be added.
Alternative poly A sites
Different poly A sites result in Exon 1 be spliced to either exon 2 or 3.
3’ end is cleaved at alternate positions.
Alternative promoters
Results in certain exons being included or excluded from transcription
Exon included/excluded/mutually exclusive
exon can be included or excluded
you can have one exon or the other just not both
Alternative 5’/3’ splice site
Earlier or later splicing at the 5’ or 3’ end.
Retained Intron
In some messages splicing occurs such that the intron remains in the mature mRNA.
Major goals of RNA seq Analysis
Count the relative number of transcripts in the sample.
Determine the structure of the transcripts in the sample.
Differential Cell Expression
Distinctive set of mRNA expressed by a cell.