sequencing only - part 3 & 4 Flashcards by Michelle Kusi

Commonly Asked Question from Clients
- Why don’t you offer other FC types on the X Plus?

As one of Illumina’s largest clients, Novogene is operating >80 Illumina sequencers globally. As we need to stock reagents globally, we bulk buy our reagents, where we get a discount from Illumina and pass on the savings to you. As a result, running a PE150 FC with us, might be the cost of running shorter reads (i.e. PE100) with another company/core.

In addition, Novogene has established and maintained an efficient workflow for sequencing libraries. By having all libraries pass through the same read strategy/FC, we are able to load/start up our sequencers faster (you can imagine, by having different flow cell types and read lengths to sort through, this will slow down the lab in loading/starting up a sequencer.) This allows us to return data within a competitive TAT.

How well did you know this?

Not at all

Perfectly

Scenario – Sequencing Only

I am looking to run my 10X scRNA-Seq libraries on a NovaSeq X Plus 1.5B Lane (PE150) – can you provide costs/TAT for this service?

We currently do not offer the NovaSeq X Plus 1.5B (PE150) lane at Novogene – would you be interested in looking at the NovaSeq X Plus 10B (PE150) lane instead?

How well did you know this?

Not at all

Perfectly

I am looking to run my 10X scRNA-Seq libraries on a NovaSeq X Plus 1.5B Lane (PE150) – can you provide costs/TAT for this service?

response #2

At Novogene, we streamline our sequencing workflows through the NovaSeq X Plus 10B (PE150) FC. We understand that fill up an entire lane might be difficult so we have introduced a partial lane service, where you can buy part of a lane (PE150) at a pro-rated cost – which tends to be more cost-effective than buying an entire 1.5B lane.

Based off the information you have provided; it looks like you are wanting ~400M paired reads. If we were to go with our partial lane service, the cost of sequencing would be $1,260 plus $15 per tube/library for QC. The TAT for this project would be ~1.5 to 2 weeks from start to finish.

Let me know if you have any questions about this workflow or would be ready to move forward with an official quote.

How well did you know this?

Not at all

Perfectly

Sales Toolbox

Many times, thinking about questions from clients can help reveal their thought process and why they are asking for a particular service/platform. If we don’t offer that particular service/platform, understanding why they are asking for it – might help understand alternative pipelines that might be satisfactory for them

Example: If a client is asking for a MiSeq run, don’t reply back saying we don’t have it but think about it from a client’s perspective: there are a few reasons on why the client might be asking for this platform:
- They need small amounts of data -> partial lane service
- They have custom sequencing/index primers -> might work with HiSeq
- They need longer reads -> might be able to shift to NovaSeq SP (PE250) or if a 16S/ITS/18 service, can offer full service at a lower cost overall

How well did you know this?

Not at all

Perfectly

Comparison between 10B and 25B (X Plus)

NovaSeq X Plus – 10B
- 8 lanes per FC
- 375Gb/1.25B paired reads output per lane
- 3.20Gb/10B paired reads output per FC
- 2 FCs per run

NovaSeq X Plus – 25B
- 8 lanes per FC
- 950Gb/3.1B paired reads output per lane
- 7,500/25B paired reads output per FC
- 2 FCs per run

How well did you know this?

Not at all

Perfectly

Pooling Libraries – Barcoding/Indexing

clients can save money by doing a partial FC so we have put more than 1 client into a FC

Client A, B & C pooled all of their libraries into a tube: tube of pooled/mixed libraries

As Client A, Client B, and Client C’s libraries are pooled into a single
tube on a lane … how do we differentiate everyone’s data once sequencing is done?

As Client A’s libraries (#1, #2, and #3) are pooled into a single
tube on a lane … how do we differentiate the data from the different samples once sequencing is done?

How well did you know this?

Not at all

Perfectly

Methods to Index Libraries
General Overview

Single Indexing

Combinatorial Dual Indexing

Unique Dual Indexing

How well did you know this?

Not at all

Perfectly

RECALL what a library looks like

parts of a library:
P5 oligo
index 2
read 1 primer
insert DNA
read 2 primer
index 1
p7 oligo

Index 1 is referred to as i7 Index
Index 2 is referred to as i5 Index

How well did you know this?

Not at all

Perfectly

single index

‘Older’ Method for indexing but still used today

Generally, allows for up to 96 samples to be pooled together on a single lane

The index will be a known sequence, generally between 6bp to 8bp – but can be longer (i.e. 10bp)

The sequence must be known, it is attached during the library preparation steps, and will need to be provided to NVG prior to sequencing, usually on the sample information form (SIF), to ensure there are no duplicates on the same lane

How well did you know this?

Not at all

Perfectly

Combinatorial Dual-Indexing

‘Newer’ Method for indexing but still not the “best”

Generally, allows for up >96 samples to be multiplexed on a single lane

The index will be a known sequence, generally between 6bp to 8bp – but can be longer (i.e. 10bp)
The sequence must be known, it is attached during the library preparation steps, and will need to be provided to NVG prior to sequencing, usually on the sample information form (SIF), to ensure there are no duplicates on the same lane

In this type of indexing, both the Index 1 and Index 2 are not unique throughout the set but each combination (Index 1 + Index 2) is unique for a specific sample
- Generally, the i7 or i5 is the same (throughout the set) and the other index changes

How well did you know this?

Not at all

Perfectly

combo. dual indexing

Instead of using actual sequences for the indexes, assume an index is a color.

i5 has 4 green rectangles
i7 is connected to each i5 green rectangle
- one brown i7 to green i5
- one blue i7 to green i5
- one orange i7 to green i5
- one gray i7 to green i5

How well did you know this?

Not at all

Perfectly

unique dual indexing

Similar to combinatorial dual-indexing but each i7 and i5 sequence is unique throughout the set of samples

Allows for a higher level of multiplexing -> 384 (or even higher)
Allows for detection of indexing hopping

How well did you know this?

Not at all

Perfectly

unique dual indexing

Instead of using actual sequences for the indexes, assume an index is a color.

i5: light purple, dark purple, blue and green
i7: turquiose, brown, light green, orange

non-redundant/unique dual indexing

How well did you know this?

Not at all

Perfectly

Comparing Combinatorial Dual Indexing vs. Unique Dual Indexing

combo: has i5 all as 1 color (so assuming that it is all the same i5) and each is connected to i7 each a different color

unique: both i5 and i7 are different colors and are connected to each other

Best way to index, but tends to be most expensive

How well did you know this?

Not at all

Perfectly

Demultiplexing (DM) Fee – Longer than 10BP Indexes

NovaSeq X Plus
For Full Lanes, there will never be an initial demultiplexing (DM) fee; if another round of DM needs to happen because of of incorrect indexes, one may be assessed.

For partial lane, anytime the indexes are >10bp, a $100 DM fee needs to be added per quote

This fee is added because the run has to be set-up with longer indexes, which causes extra work for the DM/BI teams.

Scenarios of when to charge the fee:
Example 1:
9 Libraries submitted with 12bp indexes (i7 and i5 each) for partial lane.
- Fee to be added: $50

Example 2:
5 Libraries with 6bp indexes for partial lane.
- No fee added

Example 3:
10 Libraries submitted with 12bp indexes (i7 and i5) for full NovaSeq S4 lane
- No fee added

How well did you know this?

Not at all

Perfectly

QC Steps for Pre-made Libraries (Sequencing Only)

Study These Flashcards

All samples arriving to Novogene will go through some sample QC process – what is done on the samples for QC will depend on what type of sample is submitted (tissue, cells, RNA, DNA, pre-made library, etc.) and the service that is being requested

QC is a mandatory step that must be done (client may not provide their own QC report to help speed up the process)
- As Novogene is running >80 Illumina sequencers globally, we are required to have an audit log of all the samples that are going on the sequencers. This is used in case we have any sequencing issues (related to loading, the instrument, or reagents) and helps the Novogene team + Illumina team troubleshoot issues that may occur during sample processing

QC Steps for Pre-made Libraries (Sequencing Only)

Study These Flashcards

Novogene will perform 3 QC steps on pre-made libraries:
- qPCR
- Qubit
- Fragment Analyzer

The cost for the QC is $15 per tube (this could be a single library per tube or a pre-pooled library)

QC, for reasonable sized projects, would take 1-3 working days, once samples arrive to the lab.
- Reasonable is variable but can be considered as <50 tubes

Submission of Libraries - Pooling

Client is looking to submit 3 libraries (each with a unique index) and would like 50M paired reads per tube/library. They would like to know how to submit the libraries to Novogene.

Study These Flashcards

Option #1: The client can give us 3 individual libraries for library QC -> we will perform a qPCR, Qubit, and
Fragment Analyzer and share this with the client. Once they have given us the green light to move forward, we will
pool the libraries (based off how much data is needed and amount present) and combined with other
libraries before loading onto the sequencer.

QC Cost = $15 per tube x 1 pre-pooled library = $15

Submission of Libraries - Pooling

Client is looking to submit 3 libraries (each with a unique index) and would like 50M paired reads per tube/library. They would like to know how to submit the libraries to Novogene.

Study These Flashcards

Option #2: The client can (hopefully) QC and pool (i.e. combine) the libraires on their end, normalizing based
off the # of reads needed and submit 1 tube to Novogene. This 1 tube will be QC’ed -> put into a QC report for review.
Once the client has given us the green light to move forward, we will combine their pool, based off # of reads needed
for the entire pool, with other libraries before loading onto the sequencer.

QC Cost = $15 per tube x 1 pre-pooled library = $15

What to tell client’s when they ask about pooling versus un-pooling?

Study These Flashcards

Clients are welcome to submit either un-pooled or pooled libraries (if it is partial, it is recommended to keep index <50bp or certain libraries type cannot be pooled together (the 10X Multiome)
But help them make an informed decision …

Many of our clients use Bioanalyzer (or equivalent) and Qubit to pool their libraries, while Novogene uses qPCR for pooling. qPCR tends to be more accurate for quantifying, as it will help us quantify the amount of actual library present. The appropriate quantification would allow us to reach the desired output per library/output.

If the BA + Qubit method has worked well for you – then I wouldn’t worry too much, just our general recommendations.

Reality vs Expectations

Study These Flashcards

Bioanalyzer takes 90 seconds, Qubit takes 30 seconds – within a few minutes, client will have the information needed to pool libraries based off this method (it is easier, faster, and saves them money)

qPCR takes a couple hours (1.5 to 2 hours), requires special reagents/kit (that tend to not be as cheap and require a minimum number to buy) -> although this is the better method, it is not as feasible for our clients. It’s important to give clients options, with relevant background information so they can make the best decision.

While the $15 per library QC charge would increase costs if un-pooled, we really don’t make much money on them – we provide this at a low cost (qPCR, FA, Qubit) since we realize how important the quantification step is for sequencing.

Guidelines for Submitting Pool vs. Un-pooled

Study These Flashcards

When submitting libraries for our partial lane service – it is ideal to have them submitted individually – this helps us pool more accurately across the entire lane (qPCR based)
- For partial lane – in a given tube, if submitting pooled, the client should ideally have <50 indexes present and request at least 0.5Gb data per sub-library within that pool

For any sequencing workflow, the minimum amount of data per tube should be 1Gb

Certain libraries types cannot be mixed into a pool, and you cannot have duplicate indexes within the pool

If clients are submitting individual libraries, ‘top-off’ sequencing can be arranged on a per sample basis – versus if a pooled library was submitted, the entire pool must go back on the sequencer, if additional reads are needed

Work with your RSMs to work through examples/scenarios

What is nucleotide diversity?

Study These Flashcards

Nucleotide diversity refers to the relative proportion of nucleotides A, C, G and T present in every cycle of the run.

In order for template generation to occur effectively (i.e. produce high/good quality data), it is important that there is an equal proportion of all nucleotides present in a library.

Although what we are sequencing at the end is A, C, G, and Ts – how the sequences appear in the library, are highly dependent on the library type (and sometimes species)
- Amplicon Libraries - Have Fixed Bases
- WGBS Libraries – Have a Higher Proportion of T’s sequenced (due to the bisulfite conversion)

Those (amplicon - which have fixed bases and WGBS - which have more Ts due to bisulfite) are types of libraries that have different diversities!!!

Whole-genome bisulfite sequencing (WGBS)

3 Independent Examples
of libraries

Study These Flashcards

diverse/balanced libraries
- normal WGS
- A, C, G & T present at similar % in all cycles 1, 2 & 3

low diversity libraries
- amplicon (fixed # of bases)
- single base difference per cycle

unbalanced libraries
- WGBS (more Ts due to bisulfite)
- A is absent

PhiX - A type of library created by illumina

PhiX, created by Illumina, is an index-less library generated from bacteriophage, which has a well-balanced genome (kind of like the normal WGS example from above) (45% GC and 55% AT) This can be spiked in, with low-diversity or unbalanced libraries, during sequencing to help increase nucleotide diversity The amount spiked in, would depend on the type of library being sequenced and sequencer. Can act as a ‘control’, as we are expecting this library to bind to the flow cell and be sequenced. Can help us with troubleshooting, should client’s samples have issues during sequencing Downside: The amount of PhiX that is spiked in during sequencing, will take away from the overall data output from the lane. If we spike 25% PhiX on a NovaSeq S4 (PE150) lane – we would only expect ~600Gb of raw data from the lane (200Gb is expected to be PhiX, which will show up in the undetermined)

Amplicon Libraries – Fixed Bases

Your client has 4 variants he/she wants to submit. Instead of submitting gDNA for WGS, he is interested in sequencing a specific stretch of the genome so he/she decides to perform a PCR, which will be submitted for sequencing. He/she has designed a primer that will amplify the ~300bp stretch (region of interest) Same Primer -> PCR Reaction w/ gDNA -> Produces 4 Amplicons The 4 Amplicons are further PCR’ed to add the Illumina Adapters -> Ready for Sequencing (p5/p7, index, sequencing primer binding sites added)

Sequencing of Amplicon Libraries – Fixed Bases

Assume no PhiX; Issues with CF%, pre-phasing/phasing, and color matrix calibration = poor data quality/no data Many times, amplicons need very little data and are ideal candidates for partial lane but due to the low-diversity nature, we cannot accept but … image: AGTCCT are highlighted in all 4 amplicons cycle 1: A cycle 2: G cycle 3: T cycle 4: G cycle 5: G

Sequencing of Amplicon Libraries – Staggered/Phased Primers

We can introduce ’random’ bases in our primers, to help ”shift” the fixed bases (this only works for amplicon-based preps, where we can design the primers). This can be recommended to clients (if they already made their libraries, this would involve doing it again – but many times It is not ”a lot” more work – this will allow for submission through partial lane. so the read is shifted to diagonally

PhiX Recommendations

look at ppt :)

Partial Lane Sequencing - Review

A cost-effective option to sequencing when filling up a full lane is not feasible Buy part of a lane at a pro-rated cost At a certain point, recommending a full lane (on the X Plus or the 6000 will make sense); assuming the libraries are not problematic and/or can be sequenced through either workflow - [Price of X Plus Lane]/$9.50 per Gb = Threshold for Recommendation on Full X Plus Lane For example: $1,799/$9.50 = ~190Gb -> assuming no special library types, duplicate indexes, etc. going with full lane on X Plus makes sense [Price of 6000 Lane]/$8.50 = ~689Gb -> assuming no special library types, duplicate indexes, etc. going with full lane on 6000 You cannot request a specific amount of PhiX for partial lane sequencing – our lab will add PhiX and/or other ‘normal’ library types to balance the nucleotide diversity based off the library type and/or notes in contract If library requires a specific amount of PhiX and can be accepted through this workflow, include notes in opportunity on how much PhiX is needed

Partial Lane Sequencing – Special Library Types

look at ppt for the chart :) Some libraries on the “exceptions” list are there because they are low-diversity/have fixed bases (RRBS); other library types have underproduced historically (ChIP-Seq)

sequencing only - part 3 & 4 Flashcards

(32 cards)