Lecture 9 Flashcards
Mircoarrays - Affymetrix Technology
- Photolithographic synthesis of oligonucleotides on microarrays
- Chip holds up to 1.6 million features
- Two 25mer oligonicelotide: pair of perfect match (PM) & perfect mismatch (MM) Allows quantification & subtraction of non specific cross hybridization
- Presence of mRNA detected by a series of probes
- Hybridiztion of “uorescently labelled mRNA probes detected by laser
- A Probe set consists of 11 PM /MM pair
=> expression level calculated from all probes
DRAWBACKS - needs existing knowledge on sequence
- high background due to cross hybridization
- large amounts of RNA
- limited dynamic range
1) Probe Array
2) Probe Set
3) Probe Pair
4) Probe Cell
Sanger sequencing
- Introduced in 1977 by Fred Sanger
- Still the de facto standard
To sequence a piece of DNA, you need:
- the DNA you want to sequence (template DNA)
- a short DNA primer that is complementary to the DNA you want to sequence
- an Enzyme called DNA polymerase
- four nucleotides
- four dideoxynucleotides with fluorescent tags attached for detection
- Limitations
- Low throughput
- Inconsistent base quality
- Expensive
Capilary Gel Electrophoresis separates by size
Next Generation Sequencing (Illumina)
- Fragmentation and tagging of genomic/ cDNA fragments – provides universal primer allowing complex genomes to be ampli!ed with common PCR primers
- Template immobilization – DNA separated into single strands and captured onto beads (1 DNA molecule/bead)
- Clonal Ampli!cation – Solid Phase Ampli!cation
- Sequencing and Imaging – Cyclic reversible termination (CRT) reaction
Clonal Amplification
- Clonal Ampli!cation – Solid Phase Ampli!cation
- Done in the “ow cell
- Priming and extension of single strand, single molecule template bridge ampli!cation of the immobilized template with primers forms 100-200 million spatially separated template clusters providing free ends to which a universal sequencing primer can be hybridized to initiate NGS reaction each cluster represents a population of identical templates
Massive Parallel Sequencing
- Cyclic Reversible Termination – DNA Polymerase bound to primed template adds 1 (of 4) “uorescently modi!ed nucleotide. 3’ terminator group prevents additional nucleotide incorporation.
- Following incorporation, remaining unincorporated nucleotides are washed away. Laser scanning determines identity of incorporated nucleotides.
- Cleavage step removes terminating group and “uorescent dye. Additional wash is performed before starting next incorporation step
- ~250 million data reads (25Gb) with HiSeq2500 (~4 days)
- 2013: Introduction HiSeq X Ten
- 2017: Illumina NovaSeq => 5x HiSeq X output
Single vs Paired End
- Single Read: read only one end of the DNA fragment
- Paired End: read both ends, requires additional PCR step & washout of forward strands to “reverse” strand direction in flow cell
- Distance between paired ends known => map reads over reference genome, better alignment
Multiplexing
- Barcoding DNA fragments allows multiple samples in one flowcell
- Separate Barcodes before alignment
- Taken to the extreme in single cell RNAseq => DropSeq
1) Library Preparation
2) Pool
C) Sequence
D) Demultiplex
E) Align
- Cells from suspension
- Microparticle and lysis buffer
- Oil
- Cell lysis (in seconds)
- RNA hybridization
- Break droplets
- Reverse transcription with template switching
- PCR (STAMPs as template)
- Sequencing and analysis
-> Each mRNA is mapped to its cell-of-origin and gene-of-origin
-> Each cell’s pool of mRNA can be analyzed - cDNA alignment to genome and group results by cell
- Court unique UMIs for each gene in each cell
-> Create digital expression matrix
Quality Control
- Results can be “awed due to
- unreliable base calls, contamination, sequence duplications
- Phred score: quality measure for nucleases generated by sequencing
- Read quality depends on read position, forward reverse strand
Quality Control
- Results can be “awed due to
- unreliable base calls, contamination, sequence duplications
- Phred score: quality measure for nucleases generated by sequencing
- Read quality depends on read position, forward reverse strand
The FASTQ Files
- Sequences stored in ASCII format together with quality information
- Line 1: begins with ‘@’ followed by sequence identi!er
- Line 2: raw sequence
- Line 3: +
- Line 4: base quality values for sequence in Line 2
Alignment
- Align millions of sequences fragments to reference genome
- Various short read aligners available
- Work with local seed & extend => search in indexed genome
- Assume beginning of read has few errors Sequence alignment always tradeoff between speed & accurary => never exact results!
- BLAST: Basic Local Alignment Search Tool
- Burrows-Wheeler Aligner (BWA) => uses Burrows- Wheeler Transform
- Transforms string to have repeated characters in sorted order
- Search whether DNA fragment ist substring of reference genome
Seed and Extend
- Simplification of modern aligners
- Indexing allows for quick lookup and reduced number of possible matches
- Fine for genome, but what about exome and place reads?
Transcriptome Aligners
- HISAT, STAR, TopHat2, Bowtie
- use Gene Annotations files (GFF3/GTF)
- Alternatively Spliced Transcripts
- Drosophila Example
-> Drosophila Dsx gene
-> Males: exons 1-3, 5-6 -> protein for male development
-> Females: exons 1-4 -> protein for female development
-> Intron upstream of Exon 4 has improper consensus sequence
-> U2AD proteins need splicing activators -> not used in males
Tophat Align Strategy
- If segment si alignment fails because of splice junction , but si-1 & si+1 aligned => look for donor/acceptor size near x/y within k bases
RNA-seq
- Used for many applications
- Gene expression
- Dierential gene expression
- Novel transcripts
- splice junction analysis
- de novo assembly
- SNP analysis
- Allele speci!c expression
- small/micro RNA