Transcriptomics Flashcards
What is the transcriptome?
complete set of transcripts (= mRNAs) and their relative levels of expression in a biological entity
It translates genotype to phenotype
Why is transcriptomics good to use as a proxy for protein measurements?
amount of mRNA is easy to measure
usually pos correlated with protein amount (but not always and there is a time lag)
Easier to measure than protein levels
what may be the cause of diversity between humans and chimps
even though 99% genetically similar, there are dramatic phenotypic differences.
Gene regulation must be the source of diversity.
Mary Claire King and Allen Wilson studies protein similarity between chimp and humans. very high degree.
Why must we not rely on just transcriptomics?
It is hypothesis generting rather than evidence generating.
Many RNAs are not protein encoding eg MiRNAs
what are cis/trans regulatory elements?
cis-regulatory elements are present on the same molecule of DNA/close to promoter of the gene they regulate whereas trans-regulatory elements can regulate genes distant from the gene from which they were transcribed.
what are 2 examples of morphological variation due to cis regulation?
Drosophila wing spots - black spot due to extra TF binding site
Sticklebacks - marine: have pelvic spines, whereas FW dont. FW gene present but not activated, change in TF binding site.
order of events in transcription
DNA -> pre-mRNA.
Capping, polyadenylation
Splicing - intron excision and exon joining = mRNA
Transported into cytoplasm for translation by ribosomes.
how many genes are alternatively spliced in humans and drosophila?
40-75%
human - estimated each gene had 3-8 differnt transcripts.
Sex determination through alternate splicing in drosphila
M and F have different Sexlethal expression (due to females having XX chr, so make homodimer Sisterless protein, whereas males have 1X chr so make heterodimer sisterless. Sisterless is TF of Sxl.
F Sxl protein is TF of Tea which represses ‘poison exon’ in doubles gene containing stop codon in its own mRNA, so expressed full length protein.
M Sxl poison exon is spliced into mRNA and truncated Sxl protein produced.
Double sex gene - Tra2 and Tra complex binds to repeat sequences in F specific exon of doublesex gene, leading to M and F speciific splice forms.
10 Basic steps in RNA seq
- get mRNA or total RNA in sample
- remove contaminant DNA, select mRNA using the poly T beads to grab polyA tail.
- Remove rRNA - most highly expressed RNA in cell but least informative.
- Fragment RNA
- Reverse transcribe into cDNA
- Strand specific RNA seq
- Ligate sequence adaptors (of known sequences)
- PCR amplification
- Select a range of sizes
- Sequence cDNA ends
What is an issue in data from transcriptome analysis
massive batch effects
investigated by Gilad and Mizrahiman 2015
Found more differences between species in the same organ transcriptome than in the same individuals different organs. PCA analysis.
What 4 things to consider in RNA seq experimental design
technical replicates unnecessary, as illumina has low technical variation unline microarrays.
To minimise batche ffects, do everything together at the same time.
Biological replicates are essential - 3+ from independent batches.
For alignment to reference genome, use splice aware aligner for isoform specific RNA seq
What is a splice aware aligner?
Aligner that matches RNA sequences to reference genome but leaves out introns. makes gaps in the RNA sequence and looks downstream to continue.
2 approaches of transcriptome assembly
De novo - Assemble then align RNA seq reads. scaffold the contigs then extend them with unassembled reads.
Reference based - Align then assemble. use de novo assemble for any unaligned reads.
examples of splice aware aligners
TopHat2, MapSplice, SOAPSplice, Passion, SpliceMap, RUM, ABMapper, CRAC, GSNAP, HMMSplicer, Olego, BLAT, HISAT
a program used for data quality control
fastq
basic workflow for differential gene expression with and without a reference genome and transcriptome.
- Experimental design
- sequencing
- data quality control
- Read mapping (use reference genome) (if no reference genome, do transcriptome assembly).
- Differential Expression Analysis (use Reference transcriptome if available. if not, transcriptome assembly)
why must reads be normalised before differential gene expression analysis?
samples get different sequencing depths (more /less reads).
How to normalize reads for DGE analysis
RPKM - reads per kilobase per million.
reads x gene length (kbp) x million in genome
what distribution does RNA seq usually follow?
negative bi nomial
what sorts of higher level analysis is done with RNA seq data?
- search for biological meaning
- assign biological functions using homology info
- map genes to pathways
- group genes by molecular unction, bio processes , cellular component or pathway
GO
gene othology
gives a bin which the gene falls into.
suggetss the function from a hierarchy of functions
how to search gene function?
Entrez gene - database, but searching 1 gene by 1 is long
so use GO database instead
3 major categories of GO structure
- Molecular function
eg TF, RNA binding - Bio processes
Broad goals with a group of moleular functions, eg transcription, preRNA splicing. - cellular component
subcellular location, macromolecular complexes
what is KEGG
Kyoto Encyclopedia of Genes and Genomes
pathway maps show up and down regulation of genes
Why is enrichment analysis good?
can select genes which have altered expression to analyse, rather than a massive list of all genes in a transcript.