Transcriptomics Flashcards
What is the transcriptome?
complete set of transcripts (= mRNAs) and their relative levels of expression in a biological entity
It translates genotype to phenotype
Why is transcriptomics good to use as a proxy for protein measurements?
amount of mRNA is easy to measure
usually pos correlated with protein amount (but not always and there is a time lag)
Easier to measure than protein levels
what may be the cause of diversity between humans and chimps
even though 99% genetically similar, there are dramatic phenotypic differences.
Gene regulation must be the source of diversity.
Mary Claire King and Allen Wilson studies protein similarity between chimp and humans. very high degree.
Why must we not rely on just transcriptomics?
It is hypothesis generting rather than evidence generating.
Many RNAs are not protein encoding eg MiRNAs
what are cis/trans regulatory elements?
cis-regulatory elements are present on the same molecule of DNA/close to promoter of the gene they regulate whereas trans-regulatory elements can regulate genes distant from the gene from which they were transcribed.
what are 2 examples of morphological variation due to cis regulation?
Drosophila wing spots - black spot due to extra TF binding site
Sticklebacks - marine: have pelvic spines, whereas FW dont. FW gene present but not activated, change in TF binding site.
order of events in transcription
DNA -> pre-mRNA.
Capping, polyadenylation
Splicing - intron excision and exon joining = mRNA
Transported into cytoplasm for translation by ribosomes.
how many genes are alternatively spliced in humans and drosophila?
40-75%
human - estimated each gene had 3-8 differnt transcripts.
Sex determination through alternate splicing in drosphila
M and F have different Sexlethal expression (due to females having XX chr, so make homodimer Sisterless protein, whereas males have 1X chr so make heterodimer sisterless. Sisterless is TF of Sxl.
F Sxl protein is TF of Tea which represses ‘poison exon’ in doubles gene containing stop codon in its own mRNA, so expressed full length protein.
M Sxl poison exon is spliced into mRNA and truncated Sxl protein produced.
Double sex gene - Tra2 and Tra complex binds to repeat sequences in F specific exon of doublesex gene, leading to M and F speciific splice forms.
10 Basic steps in RNA seq
- get mRNA or total RNA in sample
- remove contaminant DNA, select mRNA using the poly T beads to grab polyA tail.
- Remove rRNA - most highly expressed RNA in cell but least informative.
- Fragment RNA
- Reverse transcribe into cDNA
- Strand specific RNA seq
- Ligate sequence adaptors (of known sequences)
- PCR amplification
- Select a range of sizes
- Sequence cDNA ends
What is an issue in data from transcriptome analysis
massive batch effects
investigated by Gilad and Mizrahiman 2015
Found more differences between species in the same organ transcriptome than in the same individuals different organs. PCA analysis.
What 4 things to consider in RNA seq experimental design
technical replicates unnecessary, as illumina has low technical variation unline microarrays.
To minimise batche ffects, do everything together at the same time.
Biological replicates are essential - 3+ from independent batches.
For alignment to reference genome, use splice aware aligner for isoform specific RNA seq
What is a splice aware aligner?
Aligner that matches RNA sequences to reference genome but leaves out introns. makes gaps in the RNA sequence and looks downstream to continue.
2 approaches of transcriptome assembly
De novo - Assemble then align RNA seq reads. scaffold the contigs then extend them with unassembled reads.
Reference based - Align then assemble. use de novo assemble for any unaligned reads.
examples of splice aware aligners
TopHat2, MapSplice, SOAPSplice, Passion, SpliceMap, RUM, ABMapper, CRAC, GSNAP, HMMSplicer, Olego, BLAT, HISAT