PART III FROM GENOTYPE TO PHENOTYPE Flashcards
Which areas of the genome are genes concentrated in ?
-G/C rich areas
What percentage of the genome encodes protein or non-coding RNA?
<2%
What percentage of the genome is regulatory/introns?
- 25%
What rough % of the genome is junk DNA?
- > 50%
What is the genes and gene product mismatch problem?
- That there are about 20 000 genes but more than 500 000 proteins
What is a putative gene?
- A gene whose protein and function is not known but it is based on an ORF and believed to be a gene
What is the trasncriptome?
- COMPLETE collection of RNA produced from a genome BUT not every RNA is present in every cell and eukayotic RNAs are spiced
What does alternative splicing give rise to?
- Different protein isoforms from the SAME gene (this partially explains the gene product mismatch problem)
How can an RNA sequence be deduced?
- By making and analysing cDNA
What are cDNAs and ESTs (Expressed Sequence Tags) used to analyse?
- Used to analyse gene structure, and presence + levels of specific RNA in cells
What is transcriptomics?
- The study of THOUSANDS of RNAs simultneously
Is the whole transcriptome produced in cells?
NO
- Because only a subset of genes is active in any cell
What are the 3 major classes of RNAs that make up the eukaryotic transcriptome?
- Ribosomal RNAs trancribed by RNA pol I
- Protein encoding RNAs (mRNA) and microRNAs (miRNA) transcribed by RNA polymerase II
- Small RNAs (including tRNA) trsanscribed by RNA pol III
Are genes organised into operons in eukaryotes?
-NO
What is the splicing process?
- Where eukaryotic mRNA is produced by excision of non-coding segments (introns) from precursor (pre-mRNA)
Is splicing SEQUENCE specific and if so what can be found out from this?
- YES!
- Intron/exon boundaries can be predicted using bioinformatics genomic sequence analyses
- But there is NO specific splice seuqence that is cut out…more an overall general pattern
What is the key to gene identification in eukaryotic genome analyses?
- Accurately predicting splice junctions
Via what process can related but DIFFERENT polypeptides be generated from the same primary transcript?
- Alternative splicing
What allows for different isoforms of a transcript specifically?
- Different EXONS being incorperated OR omitted from the final mRNA
What process explains why relatively few genes in genome can give rise to vastly greater number of proteins?
- Alternative splicing
Can splicing errors cause disease via mutations?
- YES!
- Mutations can occur in splice donor or acceptor sequences OR generate NEW (cryptic) splice sequences
e. g. Exons being omitted (skipped) deletes a section of protein –> severely affects the structure
How can the use of false (cryptic) acceptor or donor sites sseverely affect the protein strucutre?
- By truncating (shortening) or lengthening exons
What is the old definition and 2 new definitions for the gene repectively?
OLD: One gene encodes one protein
NEW 1: Single transcription unit (gene) encodes one set of protein isoforms
NEW 2 (newest): A single polypeptide is the product of a single gene
What 3 things do we need to know from each gene in terms of RNA?
- Where and when it is transcribed into RNA
- How it is spliced, and how many spliceoforms there are
- Whether particular spliceoforms are restricted to particular cells or growth stage
Can 1. Where and when it is transcribed into RNA
- How it is spliced, and how many spliceoforms there are
- Whether particular spliceoforms are restricted to particular cells or growth stage be directly deduced from genomic DNA sequence with CONFIDENCE?
- NO
- Rely on analysis of cDNA and ESTs derived from RNA
What is a method to sequence RNA that is stable?
- Make a DNA cop as DNA is stable, easy to amplify, and easy to sequence (cDNA)
Why is RNA unstable?
- Because it is HIGHLY susceptible to nucleases
What is used to produce DNA from an RNA template (like in some viruses)?
- Reverse transcriptase
What 4 things does creating a complementary DNA (cDNA) rely on?
- RNA can base pair with DNA
- mRNA has a polyadenylated tail (so can be a DNA primer-TTTTTT)
- Aretroviral enztyme–> Reverse transcriptase can prodce DNA from RNA
- No pre-existing gene sequence info is required to generate a cDNA
What does producing a cDNA using PCR require?
- Pre-existing sequence information to design primers
What are ESTs? (Expressed Sequence Tag)
- cDNAs made from mRNAs originating from a specific cell or tissue (DNA copies of mRNA or mRNA fragments)
- represent a SNAPSHOT of the mRNA at that time and place
- If there is a transcriptionally ACTIVE gene it will be evident in Expressed Sequence Tag databases
What is the collection of colonies of ESTs known as?
- The library –> EST from the colony is then sequenced and data lodged in database
What are the 3 uses of EST and EST databases?
- Gene verification
- Gene structure
- Gene expression
How can EST and EST databases apply to Gene verificaiton?
- if DNA sequence from genome matches EXACTLY to a specific EST it can be concluded that the genomic DNA is TRANSCRIBED and it represents a gene (or gene fragment)
How can EST and EST databases apply to Gene Structure?
- In identifying intron and exon boundaries
- ESTs will only match exons–> so segments that do not match with an EST derived from that gene are introns
Do ESTs only match with introns or exons?
- They only match with EXONS
How can EST and EST databases apply to Gene Expression? (5 things…Identify:)
- Identify specific cells or tissue in which the gene is active
- Identify LEVEL of gene activity
- Identify alterations in gene activity in disease
- Identify transcription start and end points
- Identify alternative splicing patterns
What happens if you BLAST an EST sequence BACK onto a genomic sequence and why?
- It will ONLY MATCH EXONS because ESTs are made from POST spliced mRNA
What is the number of clones containing the same EST in one library PROPORTIONAL to?
-Proportional to the transcriptional activity of the gene
Do ESTs have a 5’ end matching the transcriptional start point of its gene?
NO
Do ESTs represent genes active in EVERY CELL?
NO
What does the program UniGene do?
- Matches ESTs from various sources and organises them into transcript families
Is each Unigene entry a collection of ESTs derived from MULTIPLE GENES or a SINGLE GENE?
- SINGLE GENE!
What are microarrays used for (in general)?
- to assess where, when, and how many genes are expressed in specific cells or tissues
What does ‘deep sequecing’ rely on?
- ESTs
What does having no hits in one section of an encode read mean?
- Alternative splicing has occurred (e.g. Exon 4 removed)
What is the simplest and BEST way to determing if a gene is real?
- Identification of a MATCHING RNA transcript (determine transcription start and end points AND to map intron/exon boundaries)
What does transcriptomics via deep sequencing enable?
- The simultaneous identification and study of THOUSANDS of transcripts produced by a specific cell or tissue
What are the two methods that allows transcripts from MANY genes to be assessed simultaneously?
- Microarray analysis
2. RNA deep sequencing