week 3: WGS - key topics Flashcards
what is Whole Genome Sequencing
a tool used to examine the DNA composition of an organism and variation between organisms at single nucleotide and chromosomal levels
single nucleotide:
- SNP - single nucleotide polymorphisms, when one base pair is different
- InDel - insertion or deletion of bases in the genome of an organism
chromosomal:
- SV - structural variant, large genomic alterations
- CNV - COPY NUMBER variation, when sections of the genome are repeated and the number of repeats in the genome varies between individuals
WGS service options
Human
Plant & Animal
Microbial
WGS Workflow overview
DNA Sample QC use:
Qubit:
- measures DNA concentration (dsDNA ng/ul)
- Quick and easy bench top tool
- Fluorescent dye binds to dsDNA
- More sensitive than UV absorbance
chart from QC
- sample name
- nucleic acid ID
- concentration
- volume of sample
- total amount (ug)
- sample QC results (pass or fail)
- sample QC memo (whether there was contamination or degradation)
Agarose gel:
- DNA purity and integrity
- Can visualize minor protein contamination and minor gDNA degradation
Library preparation - either PCR + or PCR-Free
Illumina sequencing - PE150 reads
Raw data - FASTQ files
Analysis - optional
sample requirements
have sample type (so genomic DNA)
amount (Qubit): more than or equal to 0.2 ug
volume: more than or equal to 20 ul
concentration: more than or equal to 10 ng/ul
purity (nanodrop or agarose gel) OD 260/280 of 1.8-2.0. No contamination or degradation
- this is for the most common workflow (PCR + library prep with ~350bp inserts)
our QC report assays:
- AATI (routine) - fragment analyzer
- qubit (alternative, if we have <95 samples/day in lab to test or if first AATI concentration reading is too high >200ng/ul or too low <5ng/ul)
library prep
PCR + or PCR-free
for standard library
- Abclonal kit
- insert size: 350bp
- bias of amplification: high bias, for some regions which are easy to be amplified there might be higher duplication rate. On the other hand, for regions which are hard to be amplified, less fragments will influence the analysis
- coverage of genome: low uniformity
- false positive: amplification will lead to the false positive mutation
for PCR-free library
- Abclonal kit
- insert size: 350bp
- bias of amplification: low bias, no amplification when preparing library. The original information of genome is kept perfectly
- coverage of genome: high uniformity, esp. for high GC region, promoter, repetitive sequence
- false positive: avoid false positive mutations caused by amplifications
PCR free not available on Xplus at this time
use 10B flow cell for PCR-Free because duplication rate on 25B flow cell is higher (I guess we do not want duplications!)
sequencing
Illumina (PE150
- Novaseq 6000 - moving away from :)
- Novaseq X Plus - flow cell options: 10B & 25B
Output based on Gb of data
Raw data (FASTQ files)
WGS analysis
human WGS
- mapping only
- standard analysis (mapping + SNP + Indel + SV + CNV)
plant & animal WGS
- mapping only
- standard analysis (mapping + SNP + Indel)
- mapping + SV + CNV
- Mapping + SNP + Indel + SV + CNV
Microbial WGS
- Analysis (mapping + SNP + Indel + SV + CNV)
All are priced by Gb! Chart has prices :)
data calculations
coverage depth = average number of times the genome is sequenced
genome size (Gb) x desired coverage (X) = raw data needs (Gb)
3Gb x 30X = 90Gb
Common Genomes
Human: 3Gb
Mouse: 2.75Gb
C. Elegans: 0.1Gb
Arabidopsis: 0.135Gb
Drosophila:: 0.18Gb
Wheat: 17Gb
hWGS pricing
Package: latest pricing
Separated: latest pricing
Coverage guidelines
- 30X is typical researcher request
- Somatic/rare variants: 100-1000X
- Tumor vs Normal: ≥60X tumor, ≥30X normal
- Population studies: 20-50X
- De novo assembly: 100-1000X
Plant & Animal WGS
Microbial WGS pricing
Long-read sequencing options
Plant & Animal WGS
- Package: latest pricing
- Separated: latest pricing
- TAT
Microbial WGS pricing
- Package: latest pricing
- Separated: latest pricing
- TAT
Long-read sequencing options
PacBio Revio (new)
- 15-20kb, 90Gb
PacBio Sequel II CLR mode
- Long read length (15-25kb/read)
PacBio Sequel II Hifi mode
- High fidelity (accuracy) (8-10kb/read)
AATI - separates DNA by size and analyzes them!
fragments DNA to analyze it!
5400 fragment analyzer
high through put QC
96 samples per run
uses automated capillary electrophoresis to separate DNA fragments by length/size
fluorescent intercalating dye is excited by LED light source
electronic readout is provided for each sample
AATI graph
lower marker - very thin line
sample curve
- if fragment is good, then it is a smooth bell curve. This would mean that there is no major degradation, no major additional fragments, and it passes.
- If fragment is not good, then it is thinner and has bumps at the end. Spiky and not a smooth bell curve. This would mean that there is severe degradation and/or incorrect fragment size, so it fails.
- determination is by calculated percentage from smear analysis set points, not visual estimation
upper marker - a little wider curve
qubit fluorometer (alternative QC)
quick and easy benchtop tool
target-specific fluorescence (dye binds to dsDNA)
greater sensitivity than detection by UV-absorbance like nanodrop which can sometimes not differentiate between DNA & RNA
provides dsDNA concentration (ng/ul)
Abclonal library preparation
fragmented DNA (initial material)
end preparation (step 1)
adaptor ligation (step 2)
size selection (size 3, optional)
amplification (step 4, optional)
Tumor-normal paired samples analysis
- For humans – tumor normal paired samples – if looking at tumor sample and want to see adjacent tissue – analyze sample to look for indels, SNPs, CNV, can see differences between that and tissue. Yields requires a paired sample per each group. Compared during analysis. Ex: if you take seq data from client samples and map to human genome then you might see mutations from human sample than the reference genome but is it from germline or from tissue tumor specifically? Have to make distinction between germline and somatic distinction so can do tumor-normal paired samples human standard analysis
de novo WGS Survey
If reference genome is not available, clients can purchase de novo “survey” sequencing & analysis for Plant & Animal WGS or Microbial WGS
Offers a preliminary global view of the genome
Sometimes paired with Long Read services for more complete assembly
Recommended depth ≥ 50X
Which platform to choose?
X Plus - most high-throughput short-read NGS platforms
We recently started offering X Plus pricing after successful platform installation
Quality scores of data are comparable
Lower pricing for X Plus
Sometimes clients choose to stick with 6000 for platform continuity between projects