Introduction to NGS and Library Constructions Basics Flashcards
slide 1
General Timeline of Sequencing Methods
what are the three generation types of sequencing
First Generation- Sanger Sequencing
- Second Generation - Illumina (and other NGS methods/platforms)
- Third Generation - Long Read (PacBio/Nanopore)
notes on slide 1
- only use 2 and 3rd sequencing
- sangar is very basic and novo. does not use
- sangar seq. is very very short read seq. one kind of section of the DNA. Not super useful
- in the 90’s they assembled the whole genome using Sangar but individual sections
slide 3
Sangar sequencing
- Early methods of sequencing created by Frederick Sanger in 1977
- Still used by researchers currently (clone verifications ,general CRISPR screens, etc.)
cons:
- Works best on smaller templates (PCR products, plasmids – gDNA too large)
- General read lengths are around 500bp to 800bp
- Cost per base is very high ($1,000+ per Mb)
pros
- Fast turnaround time
Longer reads (relative to Illumina)
Low Error Rate and cheap cost (per run ~$5 per sequencing reaction)
slide 3 notes
slide 4 steps in sequencing
1 - PCR with fluorescent, chain terminating ddNTPs
- take original DNA seq., PCR amplified and denatured (template is usually PCR product or cloned plasmid not gDNA)
- Mix with dNTPs and fluorescently labelled ddNTPs
2 - size separation by capillary gel electrophoresis
3 - laser excitation & deletion by sequencing machine
slide 5
second generation
No longer produced, didn’t really accomplish goal
Of affordable large data outputs, cost per G was high
Relative to machine cost ($500K machine), Roche realized this was not their marketspace
slide 7
illumina
Has become the ‘gold standard’ platform for NGS (by using the SBS technology). Illumina, originally started at Solexa
(a start-up in the UK). Solexa launched their sequencer in 2006 and ultimately, Illumina bought Solexa in 2007 (where
We start seeing major traction (in growth and applications) in the NGS world.
slide 7 notes
slide 8
overall comparison btwn seq/ platforms
slide 8 notes
slide 9
cost per human genome
in 2001, it cost $100,000 but in 2020 it costs less than $1000
slide 9 notes
slide 10
Sequencing Power for Every Scale
The HiSeq and NovaSeq
sequencers are the two major platforms
we use at Novogene. Other platforms,
such as NextSeq and MiSeq, are available,
but are generally used for specific reasons
slide 10 notes
slide 11
Flow Cell Surface
surface of flow cell coated with a lawn of oligo pairs
Different platforms, will have
different types of flow cells, which
in turn will yield different outputs
slide 12
Sequencing by Synthesis (Basis of Illumina Technology)
DNA: 0.05-1.0ug
cluster growth
sequencing
slide 12 notes
slide 13
Illumina NGS WorkflowEnabling translation of research discoveries into potential clinical applications
extract: DNA/RNA extraction from original sample (tissue, cells, blood, etc.) -> Generally done by the client but can be done by Novogene
library prep: Take the DNA/RNA and make ’libraries’ (sample material that can be loaded on the sequencer) using commercial kits.
sequence: Put libraries on the sequencer to create raw data (FASTQ files)
data analysis: Take raw data (FASTQ files) and analyze them using various software to make meaningful conclusions
Illumina Based Sequencing
- Library Construction
Starting w/ DNA or RNA and turning into Illumina - Compatible ‘Material’
Cluster Generation
Add to flow cell - Bridge amplification
Sequencing
Single base at a time, imaging - Data Analysis
Images converted into usable information
basecalls and ‘reads’ -> Raw Data
Library Preparation
Main purposes – get ideal insert sizes attached with adapter regions to make them usable on the Illumina platform
The DNA/cDNA sequences will be flanked by adapter sequences
These adapter sequences/regions will include:
- i7 (Index 1) and i5 (Index 2) sequences -> helps library sample bind to flow cell
- Index sequences -> allows for multiplexing (loading of multiple samples on a single lane with a known sequence “divider”
- R1 and R2 binding sequences -> allows the R1/R2 primers to bind for amplification (standard for most libraries but can be custom)
Basic Illumina Compatible Library Template
P5 & P7 oligo (required) - needed to bind to flow cell
index 2 (optional)
- Also referred to as i5
Index/Index 2. Not always used, Only when dual index kits are used or UDI, helps with more sample multiplexing and UDI can be beneficial for index hopping
read1 primer (required)
- Needed for bridge amplification (part of sequencing process)
Ideally Illumina compatible - but can have custom R1 binding spot,
Require custom primer additional (and changes on how lanes can be purchased)
insert DNA (required)
- Where most libraries will hold/fail QC (pre-made service),
The insert sizes need to be of “ideal length” but
Depending on the protocol, might be longer or shorter
250-300 for RNA-Seq; 300-350 for WGS -> other services vary
read2 primer (required)
- Needed for bridge amplification (part of sequencing process)
Ideally Illumina compatible - but can have custom R2 binding spot,
Require custom primer additional (and changes on how lanes can be purchased)
index 1 (required)
- Also referred to as i7
Index/Index 1, required if
multiple samples are on same
lane
anatomy of a library
P5 & P7 ends of adapters bind to flow cell
DNA insert typically ranges 200-600bp (1kb)
different methods of indexing
- inline (part of the insert) - any level of multiplexing
- single index read (<96)
- dual index reads (384+)
Inline indexes are
not part of our
regular demultiplexing
pipeline and will
require an additional
evaluation/charge.
Cluster Generation
1 - attach DNA onto flow cell
2 - DNA folds over into bridge-like shape
3 - attach primer onto DNA
4 - Complementary strand (reverse) strand is made
5 - reverse strand and forward strand
6 - clonal copies of both forward and reverse strand in a cluster
Cluster Generation
When using the HiSeq platform, cluster generation happens on
a separate machine, called the cBot
When using the NovaSeq platform, cluster generation happens on
the same machine as sequencing.
the importance of cluster density
illumina reports “optimal” cluster density for each platform
pM amounts of libraries are used for sequencing
Accurate QC and quantification are essential
2-dye vs 4 dye chem
Some Illumina platforms use a 4-channel chemistry (older platforms). Newer platforms use a 2-Channel chemistry
Some researchers might want to use a 2-dye Chemistry vs. 4-dye Chemistry (“better accuracy”).
Not something we want to start discussing with clients, unless they bring it up.
Overall Theme: The more complicated you make the conversation, the more complicated the sale becomes
done!
Once Sequencing is Done …Time to Analyze the Data!!!Can be done by Novogene or our clients may havetheir own software/workflow (pipeline) in place