Lecture 9 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Mircoarrays - Affymetrix Technology

A
  • Photolithographic synthesis of oligonucleotides on microarrays
  • Chip holds up to 1.6 million features
  • Two 25mer oligonicelotide: pair of perfect match (PM) & perfect mismatch (MM) Allows quantification & subtraction of non specific cross hybridization
  • Presence of mRNA detected by a series of probes
  • Hybridiztion of “uorescently labelled mRNA probes detected by laser
  • A Probe set consists of 11 PM /MM pair
    => expression level calculated from all probes
    DRAWBACKS
  • needs existing knowledge on sequence
  • high background due to cross hybridization
  • large amounts of RNA
  • limited dynamic range

1) Probe Array
2) Probe Set
3) Probe Pair
4) Probe Cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sanger sequencing

A
  • Introduced in 1977 by Fred Sanger
  • Still the de facto standard

To sequence a piece of DNA, you need:
- the DNA you want to sequence (template DNA)
- a short DNA primer that is complementary to the DNA you want to sequence
- an Enzyme called DNA polymerase
- four nucleotides
- four dideoxynucleotides with fluorescent tags attached for detection

  • Limitations
  • Low throughput
  • Inconsistent base quality
  • Expensive

Capilary Gel Electrophoresis separates by size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Next Generation Sequencing (Illumina)

A
  • Fragmentation and tagging of genomic/ cDNA fragments – provides universal primer allowing complex genomes to be ampli!ed with common PCR primers
  • Template immobilization – DNA separated into single strands and captured onto beads (1 DNA molecule/bead)
  • Clonal Ampli!cation – Solid Phase Ampli!cation
  • Sequencing and Imaging – Cyclic reversible termination (CRT) reaction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clonal Amplification

A
  • Clonal Ampli!cation – Solid Phase Ampli!cation
  • Done in the “ow cell
  • Priming and extension of single strand, single molecule template bridge ampli!cation of the immobilized template with primers forms 100-200 million spatially separated template clusters providing free ends to which a universal sequencing primer can be hybridized to initiate NGS reaction each cluster represents a population of identical templates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Massive Parallel Sequencing

A
  • Cyclic Reversible Termination – DNA Polymerase bound to primed template adds 1 (of 4) “uorescently modi!ed nucleotide. 3’ terminator group prevents additional nucleotide incorporation.
  • Following incorporation, remaining unincorporated nucleotides are washed away. Laser scanning determines identity of incorporated nucleotides.
  • Cleavage step removes terminating group and “uorescent dye. Additional wash is performed before starting next incorporation step
  • ~250 million data reads (25Gb) with HiSeq2500 (~4 days)
  • 2013: Introduction HiSeq X Ten
  • 2017: Illumina NovaSeq => 5x HiSeq X output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Single vs Paired End

A
  • Single Read: read only one end of the DNA fragment
  • Paired End: read both ends, requires additional PCR step & washout of forward strands to “reverse” strand direction in flow cell
  • Distance between paired ends known => map reads over reference genome, better alignment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multiplexing

A
  • Barcoding DNA fragments allows multiple samples in one flowcell
  • Separate Barcodes before alignment
  • Taken to the extreme in single cell RNAseq => DropSeq

1) Library Preparation
2) Pool
C) Sequence
D) Demultiplex
E) Align

  1. Cells from suspension
  2. Microparticle and lysis buffer
  3. Oil
  4. Cell lysis (in seconds)
  5. RNA hybridization
  6. Break droplets
  7. Reverse transcription with template switching
  8. PCR (STAMPs as template)
  9. Sequencing and analysis
    -> Each mRNA is mapped to its cell-of-origin and gene-of-origin
    -> Each cell’s pool of mRNA can be analyzed
  10. cDNA alignment to genome and group results by cell
  11. Court unique UMIs for each gene in each cell
    -> Create digital expression matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quality Control

A
  • Results can be “awed due to
  • unreliable base calls, contamination, sequence duplications
  • Phred score: quality measure for nucleases generated by sequencing
  • Read quality depends on read position, forward reverse strand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quality Control

A
  • Results can be “awed due to
  • unreliable base calls, contamination, sequence duplications
  • Phred score: quality measure for nucleases generated by sequencing
  • Read quality depends on read position, forward reverse strand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The FASTQ Files

A
  • Sequences stored in ASCII format together with quality information
  • Line 1: begins with ‘@’ followed by sequence identi!er
  • Line 2: raw sequence
  • Line 3: +
  • Line 4: base quality values for sequence in Line 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Alignment

A
  • Align millions of sequences fragments to reference genome
  • Various short read aligners available
  • Work with local seed & extend => search in indexed genome
  • Assume beginning of read has few errors Sequence alignment always tradeoff between speed & accurary => never exact results!
  • BLAST: Basic Local Alignment Search Tool
  • Burrows-Wheeler Aligner (BWA) => uses Burrows- Wheeler Transform
  • Transforms string to have repeated characters in sorted order
  • Search whether DNA fragment ist substring of reference genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Seed and Extend

A
  • Simplification of modern aligners
  • Indexing allows for quick lookup and reduced number of possible matches
  • Fine for genome, but what about exome and place reads?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Transcriptome Aligners

A
  • HISAT, STAR, TopHat2, Bowtie
  • use Gene Annotations files (GFF3/GTF)
  • Alternatively Spliced Transcripts
  • Drosophila Example
    -> Drosophila Dsx gene
    -> Males: exons 1-3, 5-6 -> protein for male development
    -> Females: exons 1-4 -> protein for female development
    -> Intron upstream of Exon 4 has improper consensus sequence
    -> U2AD proteins need splicing activators -> not used in males
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tophat Align Strategy

A
  • If segment si alignment fails because of splice junction , but si-1 & si+1 aligned => look for donor/acceptor size near x/y within k bases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RNA-seq

A
  • Used for many applications
  • Gene expression
  • Dierential gene expression
  • Novel transcripts
  • splice junction analysis
  • de novo assembly
  • SNP analysis
  • Allele speci!c expression
  • small/micro RNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RNA purification and quality

A
  • RNA Quality Assessment (Agilent 2100 BioAnalyzer)
  • 80% of RNA from ribosomes
  • 28S/18S at 5/2kB => ratio 2.7:1
  • “Crisp” 28S/18S bands => intact RNA
  • RNA Quanti!cation (Qubit) “uorometer
16
Q

Library Preparation

A

Illumina TruSeq protocol
- RNA isolation
- Poly A purification
- Fragmentation
- cDNA synthesis via random primers
- Adapter ligation
- Size selection
- PCR amplification

17
Q

Sequencing

A

Common Library types
* poly-A RNA > 200bp
* small RNA
* strand speci!cs
* stranded with rRNA reduction
* Other considerations:
* single vs. paired end
* low input
* total rna
* targeted capture
* ribosomal reduction
* degraded RNA
* Fragment size

18
Q

Expression Quantification

A
  • Count Data
    – Summarized mapped reads to CDS, gene or exon level
    – The number of reads is roughly proportional to
    – the length of the gene
    – the total number of reads in the library
19
Q

Differential Expression

A
  • A gene is declared dierentially expressed if an observed dierence or change in read counts between two experimental conditions is statistically signi!cant, i.e. if the dierence is greater than what would be expected just due to random variation.
  • Statistical tools for microarray based on numerical intensity values
  • Tools for RNA-seq need to analyze read count distributions
  • Do read counts correspond to gene expression?
20
Q

RPKM and Length Bias

A
  • Current RNAseq protocols: RNA fragmentation prior to sequencing => whole transcript covered by fragments
  • Total number of reads: gene expression & length of transcript
  • Long transcript has more reads
  • Power of experiment prop to sampling size => more power to detect longer genes
21
Q

Length Normalization

A
  • does not solve all problems
  • probability to detect long genes higher
  • data binned according to transcript length
  • percentage of diff. expressed transcripts plotted

Major bias is transcript length

22
Q

Count-based methods (R packages)

A
  1. DESeq – based on negative binomial distribution
  2. edgeR – use an overdispersed Poisson model
  3. baySeq – use an empirical Bayes approach
  4. TSPM – use a two-stage poisson model