week 3: WGS - key topics Flashcards
what is Whole Genome Sequencing
a tool used to examine the DNA composition of an organism and variation between organisms at single nucleotide and chromosomal levels
single nucleotide:
- SNP - single nucleotide polymorphisms, when one base pair is different
- InDel - insertion or deletion of bases in the genome of an organism
chromosomal:
- SV - structural variant, large genomic alterations
- CNV - COPY NUMBER variation, when sections of the genome are repeated and the number of repeats in the genome varies between individuals
WGS service options
Human
Plant & Animal
Microbial
WGS Workflow overview
DNA Sample QC use:
Qubit:
- measures DNA concentration (dsDNA ng/ul)
- Quick and easy bench top tool
- Fluorescent dye binds to dsDNA
- More sensitive than UV absorbance
chart from QC
- sample name
- nucleic acid ID
- concentration
- volume of sample
- total amount (ug)
- sample QC results (pass or fail)
- sample QC memo (whether there was contamination or degradation)
Agarose gel:
- DNA purity and integrity
- Can visualize minor protein contamination and minor gDNA degradation
Library preparation - either PCR + or PCR-Free
Illumina sequencing - PE150 reads
Raw data - FASTQ files
Analysis - optional
sample requirements
have sample type (so genomic DNA)
amount (Qubit): more than or equal to 0.2 ug
volume: more than or equal to 20 ul
concentration: more than or equal to 10 ng/ul
purity (nanodrop or agarose gel) OD 260/280 of 1.8-2.0. No contamination or degradation
- this is for the most common workflow (PCR + library prep with ~350bp inserts)
our QC report assays:
- AATI (routine) - fragment analyzer
- qubit (alternative, if we have <95 samples/day in lab to test or if first AATI concentration reading is too high >200ng/ul or too low <5ng/ul)
library prep
PCR + or PCR-free
for standard library
- Abclonal kit
- insert size: 350bp
- bias of amplification: high bias, for some regions which are easy to be amplified there might be higher duplication rate. On the other hand, for regions which are hard to be amplified, less fragments will influence the analysis
- coverage of genome: low uniformity
- false positive: amplification will lead to the false positive mutation
for PCR-free library
- Abclonal kit
- insert size: 350bp
- bias of amplification: low bias, no amplification when preparing library. The original information of genome is kept perfectly
- coverage of genome: high uniformity, esp. for high GC region, promoter, repetitive sequence
- false positive: avoid false positive mutations caused by amplifications
PCR free not available on Xplus at this time
use 10B flow cell for PCR-Free because duplication rate on 25B flow cell is higher (I guess we do not want duplications!)
sequencing
Illumina (PE150
- Novaseq 6000 - moving away from :)
- Novaseq X Plus - flow cell options: 10B & 25B
Output based on Gb of data
Raw data (FASTQ files)
WGS analysis
human WGS
- mapping only
- standard analysis (mapping + SNP + Indel + SV + CNV)
plant & animal WGS
- mapping only
- standard analysis (mapping + SNP + Indel)
- mapping + SV + CNV
- Mapping + SNP + Indel + SV + CNV
Microbial WGS
- Analysis (mapping + SNP + Indel + SV + CNV)
All are priced by Gb! Chart has prices :)
data calculations
coverage depth = average number of times the genome is sequenced
genome size (Gb) x desired coverage (X) = raw data needs (Gb)
3Gb x 30X = 90Gb
Common Genomes
Human: 3Gb
Mouse: 2.75Gb
C. Elegans: 0.1Gb
Arabidopsis: 0.135Gb
Drosophila:: 0.18Gb
Wheat: 17Gb
hWGS pricing
Package: latest pricing
Separated: latest pricing
Coverage guidelines
- 30X is typical researcher request
- Somatic/rare variants: 100-1000X
- Tumor vs Normal: ≥60X tumor, ≥30X normal
- Population studies: 20-50X
- De novo assembly: 100-1000X
Plant & Animal WGS
Microbial WGS pricing
Long-read sequencing options
Plant & Animal WGS
- Package: latest pricing
- Separated: latest pricing
- TAT
Microbial WGS pricing
- Package: latest pricing
- Separated: latest pricing
- TAT
Long-read sequencing options
PacBio Revio (new)
- 15-20kb, 90Gb
PacBio Sequel II CLR mode
- Long read length (15-25kb/read)
PacBio Sequel II Hifi mode
- High fidelity (accuracy) (8-10kb/read)
AATI - separates DNA by size and analyzes them!
fragments DNA to analyze it!
5400 fragment analyzer
high through put QC
96 samples per run
uses automated capillary electrophoresis to separate DNA fragments by length/size
fluorescent intercalating dye is excited by LED light source
electronic readout is provided for each sample
AATI graph
lower marker - very thin line
sample curve
- if fragment is good, then it is a smooth bell curve. This would mean that there is no major degradation, no major additional fragments, and it passes.
- If fragment is not good, then it is thinner and has bumps at the end. Spiky and not a smooth bell curve. This would mean that there is severe degradation and/or incorrect fragment size, so it fails.
- determination is by calculated percentage from smear analysis set points, not visual estimation
upper marker - a little wider curve
qubit fluorometer (alternative QC)
quick and easy benchtop tool
target-specific fluorescence (dye binds to dsDNA)
greater sensitivity than detection by UV-absorbance like nanodrop which can sometimes not differentiate between DNA & RNA
provides dsDNA concentration (ng/ul)
Abclonal library preparation
fragmented DNA (initial material)
end preparation (step 1)
adaptor ligation (step 2)
size selection (size 3, optional)
amplification (step 4, optional)