sequencing only - part 3 & 4 Flashcards
Commonly Asked Question from Clients
- Why don’t you offer other FC types on the X Plus?
As one of Illumina’s largest clients, Novogene is operating >80 Illumina sequencers globally. As we need to stock reagents globally, we bulk buy our reagents, where we get a discount from Illumina and pass on the savings to you. As a result, running a PE150 FC with us, might be the cost of running shorter reads (i.e. PE100) with another company/core.
In addition, Novogene has established and maintained an efficient workflow for sequencing libraries. By having all libraries pass through the same read strategy/FC, we are able to load/start up our sequencers faster (you can imagine, by having different flow cell types and read lengths to sort through, this will slow down the lab in loading/starting up a sequencer.) This allows us to return data within a competitive TAT.
Scenario – Sequencing Only
I am looking to run my 10X scRNA-Seq libraries on a NovaSeq X Plus 1.5B Lane (PE150) – can you provide costs/TAT for this service?
We currently do not offer the NovaSeq X Plus 1.5B (PE150) lane at Novogene – would you be interested in looking at the NovaSeq X Plus 10B (PE150) lane instead?
I am looking to run my 10X scRNA-Seq libraries on a NovaSeq X Plus 1.5B Lane (PE150) – can you provide costs/TAT for this service?
response #2
At Novogene, we streamline our sequencing workflows through the NovaSeq X Plus 10B (PE150) FC. We understand that fill up an entire lane might be difficult so we have introduced a partial lane service, where you can buy part of a lane (PE150) at a pro-rated cost – which tends to be more cost-effective than buying an entire 1.5B lane.
Based off the information you have provided; it looks like you are wanting ~400M paired reads. If we were to go with our partial lane service, the cost of sequencing would be $1,260 plus $15 per tube/library for QC. The TAT for this project would be ~1.5 to 2 weeks from start to finish.
Let me know if you have any questions about this workflow or would be ready to move forward with an official quote.
Sales Toolbox
Many times, thinking about questions from clients can help reveal their thought process and why they are asking for a particular service/platform. If we don’t offer that particular service/platform, understanding why they are asking for it – might help understand alternative pipelines that might be satisfactory for them
Example: If a client is asking for a MiSeq run, don’t reply back saying we don’t have it but think about it from a client’s perspective: there are a few reasons on why the client might be asking for this platform:
- They need small amounts of data -> partial lane service
- They have custom sequencing/index primers -> might work with HiSeq
- They need longer reads -> might be able to shift to NovaSeq SP (PE250) or if a 16S/ITS/18 service, can offer full service at a lower cost overall
Comparison between 10B and 25B (X Plus)
NovaSeq X Plus – 10B
- 8 lanes per FC
- 375Gb/1.25B paired reads output per lane
- 3.20Gb/10B paired reads output per FC
- 2 FCs per run
NovaSeq X Plus – 25B
- 8 lanes per FC
- 950Gb/3.1B paired reads output per lane
- 7,500/25B paired reads output per FC
- 2 FCs per run
Pooling Libraries – Barcoding/Indexing
clients can save money by doing a partial FC so we have put more than 1 client into a FC
Client A, B & C pooled all of their libraries into a tube: tube of pooled/mixed libraries
As Client A, Client B, and Client C’s libraries are pooled into a single
tube on a lane … how do we differentiate everyone’s data once sequencing is done?
As Client A’s libraries (#1, #2, and #3) are pooled into a single
tube on a lane … how do we differentiate the data from the different samples once sequencing is done?
Methods to Index Libraries
General Overview
Single Indexing
Combinatorial Dual Indexing
Unique Dual Indexing
RECALL what a library looks like
parts of a library:
P5 oligo
index 2
read 1 primer
insert DNA
read 2 primer
index 1
p7 oligo
Index 1 is referred to as i7 Index
Index 2 is referred to as i5 Index
single index
‘Older’ Method for indexing but still used today
Generally, allows for up to 96 samples to be pooled together on a single lane
The index will be a known sequence, generally between 6bp to 8bp – but can be longer (i.e. 10bp)
The sequence must be known, it is attached during the library preparation steps, and will need to be provided to NVG prior to sequencing, usually on the sample information form (SIF), to ensure there are no duplicates on the same lane
Combinatorial Dual-Indexing
‘Newer’ Method for indexing but still not the “best”
Generally, allows for up >96 samples to be multiplexed on a single lane
The index will be a known sequence, generally between 6bp to 8bp – but can be longer (i.e. 10bp)
The sequence must be known, it is attached during the library preparation steps, and will need to be provided to NVG prior to sequencing, usually on the sample information form (SIF), to ensure there are no duplicates on the same lane
In this type of indexing, both the Index 1 and Index 2 are not unique throughout the set but each combination (Index 1 + Index 2) is unique for a specific sample
- Generally, the i7 or i5 is the same (throughout the set) and the other index changes
combo. dual indexing
Instead of using actual sequences for the indexes, assume an index is a color.
i5 has 4 green rectangles
i7 is connected to each i5 green rectangle
- one brown i7 to green i5
- one blue i7 to green i5
- one orange i7 to green i5
- one gray i7 to green i5
unique dual indexing
Similar to combinatorial dual-indexing but each i7 and i5 sequence is unique throughout the set of samples
Allows for a higher level of multiplexing -> 384 (or even higher)
Allows for detection of indexing hopping
unique dual indexing
Instead of using actual sequences for the indexes, assume an index is a color.
i5: light purple, dark purple, blue and green
i7: turquiose, brown, light green, orange
non-redundant/unique dual indexing
Comparing Combinatorial Dual Indexing vs. Unique Dual Indexing
combo: has i5 all as 1 color (so assuming that it is all the same i5) and each is connected to i7 each a different color
unique: both i5 and i7 are different colors and are connected to each other
Best way to index, but tends to be most expensive
Demultiplexing (DM) Fee – Longer than 10BP Indexes
NovaSeq X Plus
For Full Lanes, there will never be an initial demultiplexing (DM) fee; if another round of DM needs to happen because of of incorrect indexes, one may be assessed.
For partial lane, anytime the indexes are >10bp, a $100 DM fee needs to be added per quote
This fee is added because the run has to be set-up with longer indexes, which causes extra work for the DM/BI teams.
Scenarios of when to charge the fee:
Example 1:
9 Libraries submitted with 12bp indexes (i7 and i5 each) for partial lane.
- Fee to be added: $50
Example 2:
5 Libraries with 6bp indexes for partial lane.
- No fee added
Example 3:
10 Libraries submitted with 12bp indexes (i7 and i5) for full NovaSeq S4 lane
- No fee added
QC Steps for Pre-made Libraries (Sequencing Only)
All samples arriving to Novogene will go through some sample QC process – what is done on the samples for QC will depend on what type of sample is submitted (tissue, cells, RNA, DNA, pre-made library, etc.) and the service that is being requested
QC is a mandatory step that must be done (client may not provide their own QC report to help speed up the process)
- As Novogene is running >80 Illumina sequencers globally, we are required to have an audit log of all the samples that are going on the sequencers. This is used in case we have any sequencing issues (related to loading, the instrument, or reagents) and helps the Novogene team + Illumina team troubleshoot issues that may occur during sample processing
QC Steps for Pre-made Libraries (Sequencing Only)
Novogene will perform 3 QC steps on pre-made libraries:
- qPCR
- Qubit
- Fragment Analyzer
The cost for the QC is $15 per tube (this could be a single library per tube or a pre-pooled library)
QC, for reasonable sized projects, would take 1-3 working days, once samples arrive to the lab.
- Reasonable is variable but can be considered as <50 tubes
Submission of Libraries - Pooling
Client is looking to submit 3 libraries (each with a unique index) and would like 50M paired reads per tube/library. They would like to know how to submit the libraries to Novogene.
Option #1: The client can give us 3 individual libraries for library QC -> we will perform a qPCR, Qubit, and
Fragment Analyzer and share this with the client. Once they have given us the green light to move forward, we will
pool the libraries (based off how much data is needed and amount present) and combined with other
libraries before loading onto the sequencer.
QC Cost = $15 per tube x 1 pre-pooled library = $15
Submission of Libraries - Pooling
Client is looking to submit 3 libraries (each with a unique index) and would like 50M paired reads per tube/library. They would like to know how to submit the libraries to Novogene.
Option #2: The client can (hopefully) QC and pool (i.e. combine) the libraires on their end, normalizing based
off the # of reads needed and submit 1 tube to Novogene. This 1 tube will be QC’ed -> put into a QC report for review.
Once the client has given us the green light to move forward, we will combine their pool, based off # of reads needed
for the entire pool, with other libraries before loading onto the sequencer.
QC Cost = $15 per tube x 1 pre-pooled library = $15
What to tell client’s when they ask about pooling versus un-pooling?
Clients are welcome to submit either un-pooled or pooled libraries (if it is partial, it is recommended to keep index <50bp or certain libraries type cannot be pooled together (the 10X Multiome)
But help them make an informed decision …
Many of our clients use Bioanalyzer (or equivalent) and Qubit to pool their libraries, while Novogene uses qPCR for pooling. qPCR tends to be more accurate for quantifying, as it will help us quantify the amount of actual library present. The appropriate quantification would allow us to reach the desired output per library/output.
If the BA + Qubit method has worked well for you – then I wouldn’t worry too much, just our general recommendations.
Reality vs Expectations
Bioanalyzer takes 90 seconds, Qubit takes 30 seconds – within a few minutes, client will have the information needed to pool libraries based off this method (it is easier, faster, and saves them money)
qPCR takes a couple hours (1.5 to 2 hours), requires special reagents/kit (that tend to not be as cheap and require a minimum number to buy) -> although this is the better method, it is not as feasible for our clients. It’s important to give clients options, with relevant background information so they can make the best decision.
While the $15 per library QC charge would increase costs if un-pooled, we really don’t make much money on them – we provide this at a low cost (qPCR, FA, Qubit) since we realize how important the quantification step is for sequencing.
Guidelines for Submitting Pool vs. Un-pooled
When submitting libraries for our partial lane service – it is ideal to have them submitted individually – this helps us pool more accurately across the entire lane (qPCR based)
- For partial lane – in a given tube, if submitting pooled, the client should ideally have <50 indexes present and request at least 0.5Gb data per sub-library within that pool
For any sequencing workflow, the minimum amount of data per tube should be 1Gb
Certain libraries types cannot be mixed into a pool, and you cannot have duplicate indexes within the pool
If clients are submitting individual libraries, ‘top-off’ sequencing can be arranged on a per sample basis – versus if a pooled library was submitted, the entire pool must go back on the sequencer, if additional reads are needed
Work with your RSMs to work through examples/scenarios
What is nucleotide diversity?
Nucleotide diversity refers to the relative proportion of nucleotides A, C, G and T present in every cycle of the run.
In order for template generation to occur effectively (i.e. produce high/good quality data), it is important that there is an equal proportion of all nucleotides present in a library.
Although what we are sequencing at the end is A, C, G, and Ts – how the sequences appear in the library, are highly dependent on the library type (and sometimes species)
- Amplicon Libraries - Have Fixed Bases
- WGBS Libraries – Have a Higher Proportion of T’s sequenced (due to the bisulfite conversion)
Those (amplicon - which have fixed bases and WGBS - which have more Ts due to bisulfite) are types of libraries that have different diversities!!!
Whole-genome bisulfite sequencing (WGBS)
3 Independent Examples
of libraries
diverse/balanced libraries
- normal WGS
- A, C, G & T present at similar % in all cycles 1, 2 & 3
low diversity libraries
- amplicon (fixed # of bases)
- single base difference per cycle
unbalanced libraries
- WGBS (more Ts due to bisulfite)
- A is absent
PhiX - A type of library created by illumina
PhiX, created by Illumina, is an index-less library generated from bacteriophage, which has a well-balanced genome (kind of like the normal WGS example from above) (45% GC and 55% AT)
This can be spiked in, with low-diversity or unbalanced libraries, during sequencing to help increase nucleotide diversity
The amount spiked in, would depend on the type of library being sequenced and sequencer.
Can act as a ‘control’, as we are expecting this library to bind to the flow cell and be sequenced.
Can help us with troubleshooting, should client’s samples have issues during sequencing
Downside: The amount of PhiX that is spiked in during sequencing, will take away from the overall data output from the lane.
If we spike 25% PhiX on a NovaSeq S4 (PE150) lane – we would only expect ~600Gb of raw data from the lane (200Gb is expected to be PhiX, which will show up in the undetermined)
Amplicon Libraries – Fixed Bases
Your client has 4 variants he/she wants to submit. Instead of submitting gDNA for WGS, he is interested in sequencing a specific stretch of the genome so he/she decides to perform a PCR, which will be submitted for sequencing.
He/she has designed a primer that will amplify the ~300bp stretch (region of interest)
Same Primer -> PCR Reaction w/ gDNA -> Produces 4 Amplicons
The 4 Amplicons are further PCR’ed to add the Illumina Adapters -> Ready for Sequencing (p5/p7, index, sequencing primer binding sites added)
Sequencing of Amplicon Libraries – Fixed Bases
Assume no PhiX;
Issues with CF%,
pre-phasing/phasing,
and color matrix calibration
= poor data quality/no data
Many times, amplicons need very little data and are ideal candidates for partial
lane but due to the low-diversity nature, we cannot accept but …
image: AGTCCT are highlighted in all 4 amplicons
cycle 1: A
cycle 2: G
cycle 3: T
cycle 4: G
cycle 5: G
Sequencing of Amplicon Libraries – Staggered/Phased Primers
We can introduce ’random’ bases
in our primers, to help ”shift” the fixed
bases (this only works for amplicon-based
preps, where we can design the primers).
This can be recommended to clients (if they already made their libraries, this would involve doing it again – but many times
It is not ”a lot” more work – this will allow for submission through partial lane.
so the read is shifted to diagonally
PhiX Recommendations
look at ppt :)
Partial Lane Sequencing - Review
A cost-effective option to sequencing when filling up a full lane is not feasible
Buy part of a lane at a pro-rated cost
At a certain point, recommending a full lane (on the X Plus or the 6000 will make sense); assuming the libraries are not problematic and/or can be sequenced through either workflow
- [Price of X Plus Lane]/$9.50 per Gb = Threshold for Recommendation on Full X Plus Lane
For example: $1,799/$9.50 = ~190Gb -> assuming no special library types, duplicate indexes, etc. going with full lane on X Plus makes sense
[Price of 6000 Lane]/$8.50 = ~689Gb -> assuming no special library types, duplicate indexes, etc. going with full lane on 6000
You cannot request a specific amount of PhiX for partial lane sequencing – our lab will add PhiX and/or other ‘normal’ library types to balance the nucleotide diversity based off the library type and/or notes in contract
If library requires a specific amount of PhiX and can be accepted through this workflow, include notes in opportunity on how much PhiX is needed
Partial Lane Sequencing – Special Library Types
look at ppt for the chart :)
Some libraries on the “exceptions” list are there because they are low-diversity/have fixed bases (RRBS); other library types have underproduced historically (ChIP-Seq)