Module 7.2 Cancer Genome Sequencing 1 Flashcards
Tissue Biopsy
cancer genome sequencing
features (3)
- direct sequencing of archived tumor tissues or tumor micro environments (connective tissue cells) or cell free DNA samples from blood
- multiple sequencing types ( targeted, RNA, epigenetic, microbe)
- goal is to assemble parts list of cancer
cancer genome sequencing
parts
genetic structures including both DNA and RNA that are altered in cancer
The Cancer Genomic Atlas
(TCGA)
- landmark cancer genomics program
- molecularly characterized over 20,000 primary cancers and matched normal samples spanning different cancer types
- genomic, epigenomic, transcriptomic and proteomic data publicly available through Genomic Data Commons portal
tissue biopsy
sampling methods (3)
- excision biopsy: entire lump or suspicious area (maybe some healthy tissue from same area) is removed
- incisional biopsy: small cut is made into area of abnormal tissue and small sample is removed
-
needle biopsy: sample of tissue or fluid is removed with needle
- wide needle: core biopsy
- thin needle: fine needle biopsy
- tumor purity as low as 10-20%
tissue biopsy
storage methods (2)
- Formalin-fixed paraffin-embedded (FFPE)
- Fresh frozen (FF)
tissue storage method
Formalin-fixed paraffin-embedded
(FFPE)
benefits and drawbacks
Benefits
- most common sources of archived materials
- can be stored in a cabinet at room temperature
- cheap to create
- can be stable for very long time
- most available types of samples for tumor sequencing
Drawbacks
- fragmentation of DNA and formalin-induced DNA damages = sequencing artifacts
tissue storage method
Formalin-fixed paraffin-embedded
(FFPE)
features
- formalin and wax preserves fragile structures inside and between the cells in tissue
- Proteins preserved in denatured form
- Nucleic acid can be isolated but not preserved very well and may not be ideal for molecular analysis
tissue storage method
Fresh frozen
(FF)
benefits and drawbacks
Benefits
- works very well for molecular genetic analysis
- better if dipped in liquid nitrogen (flash freezing) and stored -80C
Drawbacks
- surgeons may not have access to liquid nitrogen for flash freeze
- biobanks have smaller frozen tissue collections
FFPE
Extraction process
- Paraffin blocks containing tumors cut using microtome to achieve thin slices (5-10 micrometers)
- One slice is mounted on glass slides and stained with H&E to confirm presence of tumor tissue
- pathologist estimates percent of tumor content in the tissue by counting percent of nuclei from cancer based cell pathology
- If the tumor fraction is low (<20%), micro dissection of tumor can be performed by superimposing each unstained slice with H&E template to enrich for tumor content. If high, don’t need microdissection.
- dissected areas are deparaffinated and used for DNA extraction
sequencing targets
whole genome
- most expensive
- whole coverage but limited depth
- hard to detect variants in small fraction of cells
whole exome
- only protein-coding genes
- sequencing depths of 100-200x
targeted
- thousands of X coverage
- may not capture large structural variants with high sensitivity
cancer genomic sequencing
workflow
- Extract DNA and convert to sequencing library
- Perform paired-end WGS sequencing
- Assess QC metrics
matched normal
- normal samples and non-cancer cells originating in same tissue from same patient
- can also use patient’s blood sample (typically white blood cells)
Quality Control
Pre-alignment
metrics (6)
- % duplicate reads
- Base quality scores
- % Reads aligned
- % Paired GC content
- Insert size distribution
- PCR duplicates
Quality Control
Post-alignment
sources of mapping errors (6)
- inappropriate reference genome
- polymorphisms
- sequencing errors
- segmental duplications
- repetitive sequences
- incomplete reference genome
Factors affecting observed VAF
3
1. Tumor purity (Tumor fraction in tissue sample- somatic)
2. Intra-tumor heterogeneity (different subclones or wild type normal cells within tumor)
3. Copy number (at locus)
Cancer genomic sequencing
Variant allele frequency
features
- VAF = # of reads supporting candidate mutation / read depth at position
- key determinant in finding a somatic variant
- subclonal mutation present in 20% of diploid tumor cells = 10% VAF -> 60 X sample = 6 variant reads (3 reads is tumor purity = 50%)
candidate somatic mutations
- genomic positions for which alternate allele supported by tumor reads is not present in matched normal sample
- SNVs and Indels most common
subclonal mutation
mutation that is present in a subset of tumor cells in a tumor sample or biopsy
transversion
point mutation that changes purine to pyrimidine and vice versa
tumor ploidy
- amount of DNA in tumor cell
- diploid: grows more slowly
- aneuploid: abnormal amount of DNA
- helps determine how malignant a tumor is
removing germline variants in sequencing without matched normal
- normal tissue not always available in clinical applications
- filter annotated SNPs found in database such as DB snips and Gnomad (SNVs may be incorrectly identified due specific workflow or specific germline variants)
- use panel of normals collected from different individuals, but processed in same way as tumor samples
Variant error sources
4
- Library preparation: DNA polymerase for DNA synthesis and amplification can induce artifacts
- Oxidative Damage (i.e. C-A): Guanine oxidation during fragmentation via shearing can lead to low frequency transversions of C to A
- FFPE induced DNA damage (i.e. C-T): DNA fragmentation and base changes induced by formaldehyde, especially deamination of cytosine into thymine = high noise levels (CTT)
- Sample contamination: matched normal sample may be contaminated by tumor cells, or normal sample contaminating a tumor sample from a different patient
artifacts
variations introduced by non-biological processes
CNV detection
- segment genome into regions with distinct copy numbers using statistical techniques
- use matched control or normalization to statistically remove bias
- More advanced methods incorporate minor allele frequencies inferred from heterozygous SNPs for segmentation and to detect allele specific copy number variations
B allele frequency
(BAF)
- BAF is an estimate of the frequency of B allele of a given SNP in population of cells from which DNA was extracted
- In normal cell, BAF at any locus is either 0 (AA), 0.5 (AB) or 1 (BB) and the expected log R ratio is 0.
Structural variant detection
- identified by split reads and clusters of discordant read pairs
- breakpoint junctions often show complex patterns -> poor alignment
- structural variant algorithms include local assembly step
- context assembled from raw reads improve read mapping and characterization of insertion sequences at breakpoints
- read depths data can provide additional information to improve detection of deletions and amplifications
- Somatic structure variants harder to detect due to low VAF
- number of supporting reads will fluctuate due to non-uniform read coverage across genome and sampling variation
- dynamic determination of appropriate threshold (eg. number of supporting split reads) depending on local context + various filters to increase detection sensitivity
split read
- When only one end of a read aligns to the reference genome
- provide evidence of a breakpoint and type of structural variant present in sample genome
discordant read pair
reads in a pair mapped to different chromosomes, or in incompatible orientations, or not within size limit of sequencing library
Copy-neutral LOH
- copy-neutral loss of heterozygosity
- one allele mutated to match other allele
- same copy number but now homozygous for allele 1 or 2
tumor profiling
Guide patient care
a. Predictive biomarkers for therapy selection
can be either tumor subtype specific or tumor agnostic
b. Assist with cancer subtype diagnosis
c. Confer increased heritable cancer risk
Basic clinical research
a. Cancer pathogenesis and progression
b. Biomarker discovery
c. New drug development
EGFR Leu858Arg
- epidermal growth factor receptor
- leucine replaced with arginine at position 858
- predictive biomarker for use of EGFR tyrosine kinase inhibitors in treating non-small cell lung cancer patient
DNA biomarker
molecules that indicate normal or abnormal process taking place in your body and may be sign of an underlying condition or disease
MSI-H and TMB-H
Biomarkers associated with tumor instability and response to anti-PD-1 immune checkpoint inhibitors for treating multiple cancer types
- Microsatellite Instability-High
associated with deficiency in mismatch repair - Tumor Mutation Burden High
>10 mutations per megabase of DNA