Genomics Flashcards
Gene expression
- Exons code for proteins
- Not all genes are active at the same time → diversity in cells →ensured by cell type specific gene expression
- Gene expressed = the RNA transcribed from the gene is actually produced
Measuring RNA production: qPCR
- rtPCR (retro-transcribed) –oldest method
- quantitative PCR allows gene of interest to understand how much cDNA is present in cells
- RNA cannot be directly measured by PCR → synthesise cDNA → complementary to RNA → reaction w/reverse transcriptase enzyme
- cDNA only includes exons because RNA is spliced
- cDNA made with specific fluorophore incorporated into RNA
- When Taq polymerase is completing second strand → fluorophore released→ qPCR quantifies how much fluorophore in reaction → more light = more RNA = more gene expression & earlier signal showing up
qPCR Normalisation
- Results need to be normalised to measure actual change in transcription level → need to compensate for initial variations in mRNA and technical differences w/sequencing
- Housekeeping genes included in qPCR → have stable expression → important for cell components → always expressed above certain threshold
qPCR Limitations
- Quick, relatively accurate, cheap but…
- Limited as to how many genes can be tested at any one time ~5-10 (not possible for 1000’s genes)
Microarray
- Revolutionised gene expression analysis
- Allows for detection/comparison of thousands of genes simultaneously
- Relies on base-pairing hybridization with probes for each gene to be measured
- More expensive than PCR, but still relatively cheap
- Can measure:
o Differing expression of genes over time, between tissues and different states
o Co-expression of genes
o Identification of complex genetic diseases
Affy Gene Chip & Array Experiment
- Each gene has 16-20 pairs of probes synthesized on the chip
1. RNA extraction
2. Make cDNA using biotin (important for binding to streptavidin which is probe-specific)
3. If binding between gene of interest and probe chip releases light quantify it
Affy Expression Measurements
- A = absent
- M = marginal
- P = present (P-value gives confidence)
Microarray Limitations
- Data is very noisy
- Probes not available for all genes; Affy probes only for ~75-80% of human genes
- Cannot detect genes w/very low expression levels
- Data requires lots of statistics and analysis
- Assay does not distinguish expression from different isoforms of the same gene
Next generation sequencing
- Based on getting fragments out of a genome →adding sequences (adaptors) at the edges →adaptors always have same sequences →adaptors bind on flow cell and bend fragment then second adaptor binds → bridge-like formation
- Flow cell amplifies PCR → fragments w/adaptor still bind on sequencing machine to get sequenced →have millions of fragments ~75-150 base pairs → map onto reference genome
- “Seq” principle
RNA-seq
- Uses next generation sequencing to measure gene expression
- Can assume that every mRNA present will be sequenced the same number of times
- If experiment shows 2x mRNA for particular gene as control, then gene expression is 2x greater
- Gives accurate measure of gene expression, even for genes w/v. low expression levels
- Can identify exact transcript being expressed
- Can potentially identify unknown transcripts with novel splice sites
- Method:
1. Extract all mRNA → convert to cDNA fragments
2. Add sequencing adaptors → obtain short sequence using high-throughput sequencing
3. Resulting sequence reads aligned w/reference genome or transcriptome
4. Base count profile for each gene is created - Same procedure for control and variant
- Read counts are proportional to gene expression level
RNA-Seq Normalisation
- Important to normalise:
o Sequencing depth = how many reads are sequenced by the machine
o Length when dealing w/ different organisms (e.g. human vs mouse)
o Amount of fragments in each sample - 2 main methods to normalise data:
o Raw read count normalisation
o Reads/fragments per KiloBase per Million reads (RPKM -single end reads; FPKM – paired end reads)
RPKM = 109C / N L
C = raw count of reads in transcript
N = number of mappable reads in experiment
L = transcript length (bp)
Normalizes for gene length (C and L) and library size (N)
Raw Read Count Normalisation –DESeq2
- Aims to make normalized counts for non-differentially expressed genes similar between samples
- Does not aim to adjust count distributions between samples
- Assume that:
o Most genes are not differentially expressed
o Differentially expressed genes divided equally between up and down - Relies on housekeeping genes
- Normalisation looks for set of important, highly expressed genes → assume that expression is uniform across samples → same shift for housekeeping genes performed on all genes
RNA-Seq Limitation
- Cofounded by heterogeneity of the sample:
o Different cell types
o Mutations
o Different cell cycle stage
o Epigenetic modifications
o Stochastic gene expression
Single Cell RNA-Seq
- Allows analysis of single cells
o Enables improvement in resolution of gene expression within samples
o Enables identification of heterogeneity in cell populations i.e. different cell types
o Enables gene expression within single cells/cell types to be categorised - Tissue → dissociation of cells → isolation of cells → single cell → RNA extraction → cDNA synthesis → single-cell sequencing → expression profile → cell type identification
- Plots show differing gene expression in cell types clearly distinguished; even cells difficult to separated (e.g. podocytes) are effectively dissociated
- Results can be illustrated by heat maps or using dimension reduction analysis tools such as PCA or t-SNE
Future
Profile gene expression in vivo w/o need of isolating cells
DNA Methylation
- Reversible
- Symmetrical so maintained thorugh cell division
- Adding methyl group (CH3) to 5’C of cytosine by methyltransferases
- In mammals, mainly occurs at CpG sites – CpG islands
- CpG islans used for identification of potential promoter regions
- Methylation of CpG island = silencing of gene expression
- Represses gene expression by:
o Preventing binding of transcription factors
o Modifies chromatin structure to repress transcription - Methylation is major factor in epigenetic modifications
- Methyltransferases in mammals: DNMT3a and 3b
- During mitosis, hemi-methylated DNA is created → copied strand is unmethylated → recognised by DNMT1 →methylates new strand to maintain methylation state
- Methylation of histone → chromatin repressed → cannot be transcribed
DNA methylation and disease
- Methylation patterns in disease tissue ≠ from normal tissue → aids in identification of disease-causing genes
o Specially in cancer and neurodegeneration → disease correlates with loss of methylation
o E.g Alzheimer’s disease (NEP gene); Colorectal cancer (MGMT gene); breast cancer (PRLR) - Abnormal methylation silences tumour suppressor genes
Where does methylation occur
- Intergenic regions = usually methylated
o Maintains genomic integrity
o Methylated DNA forms compacted chromatin → less accessible for recombination and translocation
o DNMT1 deficient cells display genomic instability - Repetitive elements = usually methylated
o Transposable elements are highly mutagenic if they can transpose within genome → methylation protects genome from TEs
o Methylated C mutates to T over evolutionary time → prevent transposition
o Methylation prevents recombination - Gene upstream regions = usually unmethylated
- Promoter regions = usually unmethylated so create CpG islands
- Lack of methylation creates relatively higher density of CpG due to lower rate of mutation to T
Avoiding methylation
- When region is methylated at all times, DNA tries to find evolutionary solutions for methylation to be avoided → not having cytosine anymore → tend to mutate to T overtime
- If transposons accumulates mutation it loses functionality; 50% of human genome is made of transposons → 99% of them have lost their ability to be a parasite → cannot move anymore
Identifying DNA-Methylation
MeDIP-Seq
* Antibody recognizes methylated cytosine → binds meth DNA → immunoprecipitation → retain only antibody bound DNA →fragmented → next gen sequencing → sequences mapped back onto genome to identify methylated regions
Bisulphite sequencing
* Samples treated bisulphite → converts unmethylated C to U → sequence and compare samples to determine methylation e.g. cancer vs normal cells
* PCR only able to amplify U-containing DNA (non-methylated); with other primers can amplify all fragments that contain methylated DNA
Both
* Expensive
* Great resolution
* MeDIP-Seq requires antibody
X inactivation
- The silencing of one of the X chromosomes in all female mammals
- Required for dosage compensation to avoid over expression of genes on X chromosome
- Inactivated X chromosome packaged as compacted heterochromatin
o Compaction by chromosome wide histone methylation –H3K27M3 - Inactivation by Xist gene (long non-coding RNA)
Long non-coding RNA sequences
- Longer than 200 nucleotides
- Thousands identified but function largely unknown
o Target different aspects of gene transcription mechanism
o Can function as co-regulators or transcription factors - Act in ‘cis’ (same chromosome they are transcribed from) or ‘trans’ (different chromosome)
- ncRNA Evf-2 = a co-activator for homeobox transcription factor Dlx2, involved in forebrain development and neurogenesis
Xist
- 17kb long; acts in cis
- Expressed from only one of 2 X chromosomes first detectable event in X inactivation
- Xist contains many repeats → 6 identified so far
- Repeat A (RepA) silences function of Xist → binds to PRC2 (Polycomb repressive complex –a histone methyltransferase complex) → lays down histone methylation along chromosome at Lys27
HOTAIR
- Long ncRNA expressed from HOXC locus on chromosome 12 → represses HOXD on chrom.2
- ‘HOX’ = important developmental genes
- Acts in trans
- Binds to PRC2 and LSD1 → PCR2 adds repressive H3K27me → LSD1 removes active H3K4me → combined function produce repressive chromatin structure
- In cancer, HOTAIR acts on regions other than HOXD