Module 7.2 Cancer Genome Sequencing 1 Flashcards
Tissue Biopsy
cancer genome sequencing
features (3)
- direct sequencing of archived tumor tissues or tumor micro environments (connective tissue cells) or cell free DNA samples from blood
- multiple sequencing types ( targeted, RNA, epigenetic, microbe)
- goal is to assemble parts list of cancer
cancer genome sequencing
parts
genetic structures including both DNA and RNA that are altered in cancer
The Cancer Genomic Atlas
(TCGA)
- landmark cancer genomics program
- molecularly characterized over 20,000 primary cancers and matched normal samples spanning different cancer types
- genomic, epigenomic, transcriptomic and proteomic data publicly available through Genomic Data Commons portal
tissue biopsy
sampling methods (3)
- excision biopsy: entire lump or suspicious area (maybe some healthy tissue from same area) is removed
- incisional biopsy: small cut is made into area of abnormal tissue and small sample is removed
-
needle biopsy: sample of tissue or fluid is removed with needle
- wide needle: core biopsy
- thin needle: fine needle biopsy
- tumor purity as low as 10-20%
tissue biopsy
storage methods (2)
- Formalin-fixed paraffin-embedded (FFPE)
- Fresh frozen (FF)
tissue storage method
Formalin-fixed paraffin-embedded
(FFPE)
benefits and drawbacks
Benefits
- most common sources of archived materials
- can be stored in a cabinet at room temperature
- cheap to create
- can be stable for very long time
- most available types of samples for tumor sequencing
Drawbacks
- fragmentation of DNA and formalin-induced DNA damages = sequencing artifacts
tissue storage method
Formalin-fixed paraffin-embedded
(FFPE)
features
- formalin and wax preserves fragile structures inside and between the cells in tissue
- Proteins preserved in denatured form
- Nucleic acid can be isolated but not preserved very well and may not be ideal for molecular analysis
tissue storage method
Fresh frozen
(FF)
benefits and drawbacks
Benefits
- works very well for molecular genetic analysis
- better if dipped in liquid nitrogen (flash freezing) and stored -80C
Drawbacks
- surgeons may not have access to liquid nitrogen for flash freeze
- biobanks have smaller frozen tissue collections
FFPE
Extraction process
- Paraffin blocks containing tumors cut using microtome to achieve thin slices (5-10 micrometers)
- One slice is mounted on glass slides and stained with H&E to confirm presence of tumor tissue
- pathologist estimates percent of tumor content in the tissue by counting percent of nuclei from cancer based cell pathology
- If the tumor fraction is low (<20%), micro dissection of tumor can be performed by superimposing each unstained slice with H&E template to enrich for tumor content. If high, don’t need microdissection.
- dissected areas are deparaffinated and used for DNA extraction
sequencing targets
whole genome
- most expensive
- whole coverage but limited depth
- hard to detect variants in small fraction of cells
whole exome
- only protein-coding genes
- sequencing depths of 100-200x
targeted
- thousands of X coverage
- may not capture large structural variants with high sensitivity
cancer genomic sequencing
workflow
- Extract DNA and convert to sequencing library
- Perform paired-end WGS sequencing
- Assess QC metrics
matched normal
- normal samples and non-cancer cells originating in same tissue from same patient
- can also use patient’s blood sample (typically white blood cells)
Quality Control
Pre-alignment
metrics (6)
- % duplicate reads
- Base quality scores
- % Reads aligned
- % Paired GC content
- Insert size distribution
- PCR duplicates
Quality Control
Post-alignment
sources of mapping errors (6)
- inappropriate reference genome
- polymorphisms
- sequencing errors
- segmental duplications
- repetitive sequences
- incomplete reference genome
Factors affecting observed VAF
3
1. Tumor purity (Tumor fraction in tissue sample- somatic)
2. Intra-tumor heterogeneity (different subclones or wild type normal cells within tumor)
3. Copy number (at locus)
Cancer genomic sequencing
Variant allele frequency
features
- VAF = # of reads supporting candidate mutation / read depth at position
- key determinant in finding a somatic variant
- subclonal mutation present in 20% of diploid tumor cells = 10% VAF -> 60 X sample = 6 variant reads (3 reads is tumor purity = 50%)
candidate somatic mutations
- genomic positions for which alternate allele supported by tumor reads is not present in matched normal sample
- SNVs and Indels most common
subclonal mutation
mutation that is present in a subset of tumor cells in a tumor sample or biopsy
transversion
point mutation that changes purine to pyrimidine and vice versa
tumor ploidy
- amount of DNA in tumor cell
- diploid: grows more slowly
- aneuploid: abnormal amount of DNA
- helps determine how malignant a tumor is
removing germline variants in sequencing without matched normal
- normal tissue not always available in clinical applications
- filter annotated SNPs found in database such as DB snips and Gnomad (SNVs may be incorrectly identified due specific workflow or specific germline variants)
- use panel of normals collected from different individuals, but processed in same way as tumor samples
Variant error sources
4
- Library preparation: DNA polymerase for DNA synthesis and amplification can induce artifacts
- Oxidative Damage (i.e. C-A): Guanine oxidation during fragmentation via shearing can lead to low frequency transversions of C to A
- FFPE induced DNA damage (i.e. C-T): DNA fragmentation and base changes induced by formaldehyde, especially deamination of cytosine into thymine = high noise levels (CTT)
- Sample contamination: matched normal sample may be contaminated by tumor cells, or normal sample contaminating a tumor sample from a different patient
artifacts
variations introduced by non-biological processes
CNV detection
- segment genome into regions with distinct copy numbers using statistical techniques
- use matched control or normalization to statistically remove bias
- More advanced methods incorporate minor allele frequencies inferred from heterozygous SNPs for segmentation and to detect allele specific copy number variations