Lecture 9 Flashcards

Question 1

Q

Mircoarrays - Affymetrix Technology

Answer

A

Photolithographic synthesis of oligonucleotides on microarrays
Chip holds up to 1.6 million features
Two 25mer oligonicelotide: pair of perfect match (PM) & perfect mismatch (MM) Allows quantification & subtraction of non specific cross hybridization
Presence of mRNA detected by a series of probes
Hybridiztion of “uorescently labelled mRNA probes detected by laser
A Probe set consists of 11 PM /MM pair
=> expression level calculated from all probes
DRAWBACKS
needs existing knowledge on sequence
high background due to cross hybridization
large amounts of RNA
limited dynamic range

1) Probe Array
2) Probe Set
3) Probe Pair
4) Probe Cell

Question 2

Q

Sanger sequencing

Answer

A

Introduced in 1977 by Fred Sanger
Still the de facto standard

To sequence a piece of DNA, you need:
- the DNA you want to sequence (template DNA)
- a short DNA primer that is complementary to the DNA you want to sequence
- an Enzyme called DNA polymerase
- four nucleotides
- four dideoxynucleotides with fluorescent tags attached for detection

Limitations
Low throughput
Inconsistent base quality
Expensive

Capilary Gel Electrophoresis separates by size

Question 3

Q

Next Generation Sequencing (Illumina)

Answer

A

Fragmentation and tagging of genomic/ cDNA fragments – provides universal primer allowing complex genomes to be ampli!ed with common PCR primers
Template immobilization – DNA separated into single strands and captured onto beads (1 DNA molecule/bead)
Clonal Ampli!cation – Solid Phase Ampli!cation
Sequencing and Imaging – Cyclic reversible termination (CRT) reaction

Question 4

Q

Clonal Amplification

Answer

A

Clonal Ampli!cation – Solid Phase Ampli!cation
Done in the “ow cell
Priming and extension of single strand, single molecule template bridge ampli!cation of the immobilized template with primers forms 100-200 million spatially separated template clusters providing free ends to which a universal sequencing primer can be hybridized to initiate NGS reaction each cluster represents a population of identical templates

Question 5

Q

Massive Parallel Sequencing

Answer

A

Cyclic Reversible Termination – DNA Polymerase bound to primed template adds 1 (of 4) “uorescently modi!ed nucleotide. 3’ terminator group prevents additional nucleotide incorporation.
Following incorporation, remaining unincorporated nucleotides are washed away. Laser scanning determines identity of incorporated nucleotides.
Cleavage step removes terminating group and “uorescent dye. Additional wash is performed before starting next incorporation step
~250 million data reads (25Gb) with HiSeq2500 (~4 days)
2013: Introduction HiSeq X Ten
2017: Illumina NovaSeq => 5x HiSeq X output

Question 6

Q

Single vs Paired End

Answer

A

Single Read: read only one end of the DNA fragment
Paired End: read both ends, requires additional PCR step & washout of forward strands to “reverse” strand direction in flow cell
Distance between paired ends known => map reads over reference genome, better alignment

Question 7

Q

Multiplexing

Answer

A

Barcoding DNA fragments allows multiple samples in one flowcell
Separate Barcodes before alignment
Taken to the extreme in single cell RNAseq => DropSeq

1) Library Preparation
2) Pool
C) Sequence
D) Demultiplex
E) Align

Cells from suspension
Microparticle and lysis buffer
Oil
Cell lysis (in seconds)
RNA hybridization
Break droplets
Reverse transcription with template switching
PCR (STAMPs as template)
Sequencing and analysis
-> Each mRNA is mapped to its cell-of-origin and gene-of-origin
-> Each cell’s pool of mRNA can be analyzed
cDNA alignment to genome and group results by cell
Court unique UMIs for each gene in each cell
-> Create digital expression matrix

Question 8

Q

Quality Control

Answer

A

Results can be “awed due to
unreliable base calls, contamination, sequence duplications
Phred score: quality measure for nucleases generated by sequencing
Read quality depends on read position, forward reverse strand

Question 9

Q

Quality Control

Answer

A

Results can be “awed due to
unreliable base calls, contamination, sequence duplications
Phred score: quality measure for nucleases generated by sequencing
Read quality depends on read position, forward reverse strand

Question 10

Q

The FASTQ Files

Answer

A

Sequences stored in ASCII format together with quality information
Line 1: begins with ‘@’ followed by sequence identi!er
Line 2: raw sequence
Line 3: +
Line 4: base quality values for sequence in Line 2

Question 11

Q

Alignment

Answer

A

Align millions of sequences fragments to reference genome
Various short read aligners available
Work with local seed & extend => search in indexed genome
Assume beginning of read has few errors Sequence alignment always tradeoff between speed & accurary => never exact results!
BLAST: Basic Local Alignment Search Tool
Burrows-Wheeler Aligner (BWA) => uses Burrows- Wheeler Transform
Transforms string to have repeated characters in sorted order
Search whether DNA fragment ist substring of reference genome

Question 12

Q

Seed and Extend

Answer

A

Simplification of modern aligners
Indexing allows for quick lookup and reduced number of possible matches
Fine for genome, but what about exome and place reads?

Question 13

Q

Transcriptome Aligners

Answer

A

HISAT, STAR, TopHat2, Bowtie
use Gene Annotations files (GFF3/GTF)
Alternatively Spliced Transcripts
Drosophila Example
-> Drosophila Dsx gene
-> Males: exons 1-3, 5-6 -> protein for male development
-> Females: exons 1-4 -> protein for female development
-> Intron upstream of Exon 4 has improper consensus sequence
-> U2AD proteins need splicing activators -> not used in males

Question 14

Q

Tophat Align Strategy

Answer

A

If segment si alignment fails because of splice junction , but si-1 & si+1 aligned => look for donor/acceptor size near x/y within k bases

Question 15

Q

RNA-seq

Answer

A

Used for many applications
Gene expression
Dierential gene expression
Novel transcripts
splice junction analysis
de novo assembly
SNP analysis
Allele speci!c expression
small/micro RNA

Question 16

Q

RNA purification and quality

Answer

Study These Flashcards

A

RNA Quality Assessment (Agilent 2100 BioAnalyzer)
80% of RNA from ribosomes
28S/18S at 5/2kB => ratio 2.7:1
“Crisp” 28S/18S bands => intact RNA
RNA Quanti!cation (Qubit) “uorometer

Question 17

Q

Library Preparation

Answer

Study These Flashcards

A

Illumina TruSeq protocol
- RNA isolation
- Poly A purification
- Fragmentation
- cDNA synthesis via random primers
- Adapter ligation
- Size selection
- PCR amplification

Question 18

Q

Sequencing

Answer

Study These Flashcards

A

Common Library types
* poly-A RNA > 200bp
* small RNA
* strand speci!cs
* stranded with rRNA reduction
* Other considerations:
* single vs. paired end
* low input
* total rna
* targeted capture
* ribosomal reduction
* degraded RNA
* Fragment size

Question 19

Q

Expression Quantification

Answer

Study These Flashcards

A

Count Data
– Summarized mapped reads to CDS, gene or exon level
– The number of reads is roughly proportional to
– the length of the gene
– the total number of reads in the library

Question 20

Q

Differential Expression

Answer

Study These Flashcards

A

A gene is declared dierentially expressed if an observed dierence or change in read counts between two experimental conditions is statistically signi!cant, i.e. if the dierence is greater than what would be expected just due to random variation.
Statistical tools for microarray based on numerical intensity values
Tools for RNA-seq need to analyze read count distributions
Do read counts correspond to gene expression?

Question 21

Q

RPKM and Length Bias

Answer

Study These Flashcards

A

Current RNAseq protocols: RNA fragmentation prior to sequencing => whole transcript covered by fragments
Total number of reads: gene expression & length of transcript
Long transcript has more reads
Power of experiment prop to sampling size => more power to detect longer genes

Question 22

Q

Length Normalization

Answer

Study These Flashcards

A

does not solve all problems
probability to detect long genes higher
data binned according to transcript length
percentage of diff. expressed transcripts plotted

Major bias is transcript length

Question 23

Q

Count-based methods (R packages)

Answer

Study These Flashcards

A

DESeq – based on negative binomial distribution
edgeR – use an overdispersed Poisson model
baySeq – use an empirical Bayes approach
TSPM – use a two-stage poisson model

Lecture 9 Flashcards

(23 cards)