week 3: WES Flashcards

1
Q

purpose of training - whole exome sequencing

A

learn more

give more solutions to customers

increase sales opportunity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

central dogma of biology

A

DNA -> transcription -> RNA -> polyadenylation mRNA -> translation (in the ribosome) -> protein

WES/WGS variant analyses happen to DNA

mRNA-seq expression analysis happens to mRNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Exon vs intron

A

exon - a segment of a DNA or RNA molecule containing coding info. for a protein or peptide sequence

intron - a segment of a DNA or RNA molecule which does not code for protein and interrupts the sequence of genes

so introns are usually spliced out because they do not code for a protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

service overview - review

A

Theexomeis composed of exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing and contribute to the final protein product encoded by that gene

1-2% of the genome

Can be done with any species, but a capture kit must exist.

Novogene only performs WES for mouse and human. We use Agilent for both

research use only (ROU)
Agilent SureSelect V6 58M for Human
Agilent SureSelect Mouse All Exon for Mouse

Clinical
Human Clinical WES (CAP/CLIA)
- US CLIA Certified (Algilent SureSelect V6 58M)
- US CLIA Certified (Twist V2)
- China CAP Certified (IDT V1, XGen Panel)

Lab Locations:
Human WES -> China Lab & U.S. Lab (preferred)
Mouse WES -> China Lab only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Talking Points
WES vs. WGS

A

Whole exome sequencing cannot:
- Look at introns
- Epistatic interactions (gene-gene interactions)
- Look at structural variant (SV, CNV) - so chromosomal

When is whole genome sequencing better:
- The client is looking for novel mutations
- Has more uniform sequencing coverage

Exome sequencing allows for a higher sequencing coverage due to lower costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Workflow

A

Starting with genomic DNA, samples are sheared resulting in small DNA fragments

Libraries are prepared with Illumina compatible adapters and indices

Biotinylated cRNA baits are incubated with the library for 16 hours

Targeted regions are selected using magnetic streptavidin beads

Targeted regions are amplified, producing a sequence ready library

Sequence on NovaSeq 6000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sample Requirements

A

human WES
for genomic DNA
- amount (Qubit): > or equal to 300ng
- volume: > or equal to 15ul
- concentration: > or equal to 15ng/ul
- purity: OD 260/280 = 1.8-2.0, no degradation, no contamination
for FFPE
- amount (Qubit): > or equal to 400ng
- volume: > or equal to 20ul
- concentration: > or equal to 20ng/ul
- purity: fragments longer than 1000bp
for cfDNA/ccDNA
- amount (Qubit): > or equal to 35ng
- volume: > or equal to 20ul
- concentration: > or equal to 0.5ng/ul
- purity: fragments of 170bp or its multiples, no genomic DNA contamination

mouse WES
for genomic DNA
- amount (Qubit): > or equal to 300ng
- volume: > or equal to 15ul
- concentration: > or equal to 15ng/ul
- purity: OD 260/280 = 1.8-2.0, no degradation, no contamination
for FFPE
- amount (Qubit): > or equal to 400ng
- volume: > or equal to 20ul
- concentration: > or equal to 20ng/ul
- purity: fragments longer than 1000bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Analysis Pipeline

A
  1. Data quality control: filtering reads containing adapter or with low quality
  2. Alignment with reference, statistics of sequencing depth and coverage
  3. SNP and InDel calling, annotation and statistics
  4. Somatic variant detection (only apply for tumor-normal paired samples)
    - SNP calling, annotation and statistics
    - InDel calling, annotation and statistics
    - CNV calling, annotation and statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

coverage
Data Output – Raw vs. On-Target

A

6Gb/sample = 50X

12Gb/sample = 100X

How many Gb data is needed to get 200X coverage?
data amount (Gb) = coverage (X) * 0.12

Because of the lack of uniformity in WES capture, the raw coverage of 100x does not guarantee that each exon will have 100x.

If the client requires a MINIMUM of 100x, you must encourage them to sequence to a higher depth (usually to 150x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Human Whole Exome Sequencing Pricing

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mouse Whole Exome Sequencing

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Quotation Checklist

A

PI Name/Name for Quote:
Species:
Sample Number:
Coverage:
Material Provided:
Bioinformatics Analysis: Yes/No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Project Design

A

FFPE Samples -
Higher depth of coverage recommended due to low quality of DNA

Whole exome sequencing is species specific

Cannot use different reference genomes – analysis is based upon the specific reference used to design the capture probes

For tumor samples for paired comparisons, normal sample is needed (CNV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is WES

A

Whole Exome Sequencing (WES) is a high-throughput genetic sequencing technique used to capture and analyze the protein-coding regions of an individual’s genome, known as the exome. So it captures and studies the exome of the genome while WGS studies and looks at the whole genome which includes the exome and introns

The exome constitutes only about 1-2% of the entire human genome, but it contains approximately 85% of the known disease-causing genetic variants, making it a cost-effective approach for identifying genetic variations that may be associated with diseases, particularly rare genetic disorders and cancers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WES vs. WGS

A

WES allows a targeted approach allows for deep sequencing of specific genetic regions, reducing the amount of data generated and therefore cost compared to WGS

WES
- Scope: WES targets and sequences the protein-coding regions of the genome, known as exons, which make up about 1-2% of the entire genome.
- Coverage: It does not capture or analyze non-coding regions of the genome, including introns, intergenic regions, and regulatory elements.
- Cost: WES is generally more cost-effective compared to WGS since it focuses on a smaller portion of the genome. The reduced data volume can result in lower sequencing and analysis costs. (~$200-$300)
- Application: commonly used in clinical settings to identify genetic mutations associated with various diseases, especially rare genetic disorders

WGS
- Scope: WGS sequences the entire genome, including exons, introns, intergenic regions, and regulatory elements.
- Coverage: It provides comprehensive coverage of the entire genome, making it suitable for detecting variations in both coding and non-coding regions.
- Cost: WGS is more expensive than WES due to the larger amount of data generated and the broader scope of sequencing. ($400-900)
- Application: WGS is suitable for a wide range of applications, including identifying coding and non-coding variants, structural variations, and copy number variations and is more suitable for population genetic studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Project Workflow Overview

A

sample prep
(sample QC)

library prep
(library QC)

sequencing
(data QC)

Bioinformatics analysis

17
Q

Novogene Workflow
- DNA QC (both human and mouse)

A

Accepted Started Material:
gDNA
- Amount: ≥ 0.3μg (300ng)
Need 400ng if gDNA is extracted from FFPE samples (Fragments should be longer than 1,000bp)
- Volume: ≥ 20μL
- Concentration: ≥ 20 ng/μL
- OD260/280 = 1.8- 2.0
- No degradation, no contamination

Main Method: Quantification & Qualification – AATI Fragment Analyzer
- Measures sample purity, integrity, amount, and concentration

Backup: Quantification (Qubit) + Qualification (1% Agarose Gel Electrophoresis)
Davis lab will continue manual agarose gel testing/Qubit for all DNA library types when there is insufficient sample volume (significantly less than 95 samples/day)

18
Q

DNA QC (WES vs. WGS)

A

Accepted Started Material (WES)
gDNA
Amount: ≥ 0.3μg (300ng)
Need 400ng if gDNA is extracted from FFPE samples (Fragments should be longer than 1,000bp)
Volume: ≥ 20μL
Concentration: ≥ 20 ng/μL
OD260/280 = 1.8- 2.0
No degradation, no contamination

Accepted Started Material (WGS)
gDNA
Amount: ≥ 0.2μg (200ng)
Need 1.2 ug for PCR Free workflow (only available on the NovaSeq6000)
Need 400ng if gDNA is extracted from FFPE samples (Fragments should be longer than 1,000bp)
Volume: ≥ 20μL
Concentration: ≥ 10 ng/μL
OD260/280 = 1.8- 2.0
No degradation, no contamination

NOTE: WES pipelines require ~100ng more starting material than WGS. This is sometimes a limiting factor for projects because the client may not have that much DNA

19
Q

Novogene Workflow

A

Starting with genomic DNA, samples are sheared resulting in small DNA fragments

Libraries are prepared with Illumina-compatible adapters and indices
(at this stage they are fully prepped WGS libraries)

Hybridization: Biotinylated RNA baits are incubated with the library for 16 hours

Capture: Targeted regions are selected using magnetic streptavidin beads

Wash away the biotin and digest RNA

Targeted regions are amplified, producing a sequence ready library

20
Q

Library Prep and Calculating Coverage

A

Agilent SureSelect V6 58M Library Prep Kit is a ~60Mb panel. Approx 50% productivity/capture efficiency, thus we account for this by sequencing at a 2X factor

50X coverage = 6Gb
100X Coverage = 12Gb

21
Q

WES vs WGS

A

WES
- selection (capture)
all exons of all known genes (1.5-2% of all human DNA)
- variable read depth at boundaries
- greater sequencing depth
- more cost effective

WGS
- no selection
- entire DNA analyzed including introns, RNA genes, etc
- moderate read depth
- similar read depth across the genome
- can identify copy number variants, repeat expansions
- higher price

Note the increased uniformity when using WGS vs. WES

The reason for this is the way the exons are “baited”. There is inherent bias based on the varying complexities of the exonic regions (GC content, and the presence of repetitive sequences etc.)

Uniformity = the evenness with which different regions of a genome are sequenced.

22
Q

Sample Requirements

A

Selective targeting of exons inherently introduces bias because not all regions will be selected for equally

GC-Rich Regions:
Genomic regions with high GC (guanine-cytosine) content can be challenging to sequence accurately. WES may result in lower coverage in GC-rich exons because high GC content can lead to stronger binding and potential secondary structures that hinder probe access, while low GC content may result in weaker binding and less efficient capture.

Repetitive Elements/ Low-Complexity Regions
WES may struggle to accurately capture and sequence regions containing repeats, leading to lower coverage and potential alignment issues.
In regions with very high similarity, sequences can collapse into a single representation
There can also be difficulty of designing specific probes that bind uniquely to these regions without cross-hybridizing to similar sequences elsewhere in the genome

Sales Application: be cautious when qualifying projects if clients need at LEAST 50X coverage off all the exomes, it may be worthwhile to quote them 100X coverage, knowing that the reads are not always equally distributed.
- with 50X coverage, some regions may get 70X coverage, and others get 25X coverage

23
Q

Analysis Content
WES vs WGS

A

WES
- Data quality control: filtering reads containing adapter or with low quality
- Alignment with reference, statistics of sequencing depth and coverage
- Varient (SNP and InDel) calling, annotation and statistics
- Somatic Varient (paired tumor samples) SNP/InDel/CNV calling, annotation and statistics

WGS
- Data quality control: filtering reads containing adapter or with low quality
- Alignment to reference genome; statistics of sequencing depth and coverage
- Variant (SNP, CNV, InDel and SV) calling, annotation and statistics
- Somatic variant (paired tumor samples) detection
SNP calling, annotation and statistics
CNV calling, annotation and statistics
InDel calling, annotation and statistics
SV calling, annotation and statistics

24
Q

Analysis Pipeline

A
  1. Data quality control: filtering reads containing adapter or with low quality
  2. Alignment with reference, statistics of sequencing depth and coverage
  3. SNP and InDel calling, annotation and statistics
  4. Somatic variant detection (only apply for tumor-normal paired samples)
    - SNP calling, annotation and statistics
    - InDel calling, annotation and statistics
    - CNV calling, annotation and statistics

FAQ’s
Why does CNV detection require paired samples for WES project?​
The probes used in WES have different specificities for different regions, and this preference willaffect the detection of CNV. As a result, control samples are needed to establish a baseline.​
Why can’t SV be detected in WES project?​
SV size is usually 1kb-3Mb, while WES project is limited by the capture region, which makes it notsuitable for SV detection.​

25
Q

WES vs. WGS

A

WES analysis will lack in comparison to WGS when the researcher is interested in…
- Rare/Novel Variants
- Structural Variant (SV) Detection (e.g., large insertions, deletions, inversions, translocations)
Requires a comprehensive view of the genome (including identifying coding and non-coding variants, structural variations, and copy number variations.)

26
Q

Somatic Variant Calling

A

Somatic Variant Calling = Comparing germline cells to tumor cells (Tumor / Normal Pairs)

In leu of comparing cancer cells to reference genome, compare to normal cells within same person
- Germline mutations = variant you have since you are born (from germ cells)

Sales Application: Often we need higher coverage for “case sample” or cancer samples since there is a lot more variation in the mutated genomes thus a lot more complex.
- i.e. >100X coverage for case sample [tumor] & 50X coverage for control [normal]
Compared to germline mutations in genetic diseases, somatic mutations in tumor tissue samples are less frequent and require a higher sequencing depth. This is partly due to the low percentage of tumor cells in tumor tissues, and partly due to the fact that mutations arising in the later stages of cancer development are present in only a small number of tumor cells. High-depth sequencing is used to detect as many mutations as possible that are associated with cancer development.​

27
Q

hWES price list

A
28
Q

Data Output

A

6Gb/sample = 50X
12Gb/sample = 100X

If a client wants to send 25 samples for WES. They are sending gDNA isolated from human lymph cells and are requesting 150X coverage per sample. They also need help with analysis. How much will this project cost per sample & in total (US lab)?

Solution:
150x Coverage = 18Gb

Package Price + Extra Data

12Gb package price, WBI ($219) + extra data (extra 6Gb X $7.50 = $45) = $264/sample

$264X 25 samples = $6,600 total project

29
Q

mouse price list

A
30
Q
A