Module 6.3 Genomic Variants Flashcards

Question

somatic mutation

Answer 1

- any mutation that occurs in cell of human body - not usually transmitted to offspring - will be present in all descendant cells of that cell (mutant clone) - Many cancers result of accumulated somatic mutations

Answer 2

- relative frequency of allele at a particular locus in a population - used to describe amount of variation at a particular locus or across multiple loci in population fraction of all chromosomes in population carrying specified allele divided by chromosomes in population (or sample size) - eg. MAF of A = (2 x AA + 1 x Aa) / (15 individuals x 2 chromosomes)

Answer 3

- measures proportion of variant alleles at a genomic locus within cell population of a human sample - NGS experiment: VAF = (mutants reads that cover the position) / (all reads that cover the position)

Answer 4

- homozygous: VAF = 100% - heterozygous: VAF = 50%

Answer 5

depends on where and how biological sample is collected and how many mutant somatic cells are within sample population **homozygous: VAF = 2n / 2(N+n)** - eg. 3 mutant cells of 20 = (2 x 3) / 2 x (17+3) = 15% **heterozygous: VAF = n / 2(N+n)** - eg. 3 mutant cells of 20 = 3 / 2 x (17+3) = 7.5%

Answer 6

1. PCR-based (known variants) 2. Microarray (known variants) 3. Genome sequencing - whole genome - targeted

Answer 7

- human genome project (reference genome) - currently focuses on individual genomes - genetic variations among individuals - study genetic basis of diseases - facilitate personalized medicine

Answer 8

- focuses on specific regions of genome - allows deep sequencing to look for variants present at very low allele frequencies - widely used, efficient, and cost effective - Enrichment (selection of region of interest) is critical step

Answer 9

1. hybrid capture 2. amplification

Answer 10

1. convert DNA samples into sequencing libraries (fragmentation and adapter ligation) 2. do low-cycle PCR to amplify libraries to ensure all molecules (target or non-target) have adapters **Hybridization** 3. denature all library DNA molecules 4. mix library molecules with blocking oligos to prevent non specific interactions between molecules.and biotinylated probes 6. incubate mixture in hybridization buffer optimized for oligo and probe binding - Blocking oligos bind to adapter sequences - Probes bind to region of interest 7. After hybridization, add streptavidin-coated magnetic beads to separate target region from rest of genome 8. remove probes 9. Take enriched DNA through 2nd round of PCR and NGS

Answer 11

- synthetic DNA or RNA single stranded oligos specific to region of interest. - 100-120 bp long with biotin attached to one end of probe - collection of probes = panels

Answer 12

**Benefits** - can easily target hundreds to millions of bases in genome - easier to scale for sequencing more complex and bigger target regions - allow you to target more genes and support more comprehensive profiling **Drawbacks** - takes longer to complete experiment - more laborious

Answer 13

includes all exons and upstream regulatory regions

Answer 14

- can use whole genomic DNA or fragmented DNA - regions of interest amplified using sequence specific primers - singleplex or multiplex - many different PCR versions

Answer 15

1. sequence specific primers have a part of sequence adapter (Read 1 and Read 2) attached to 5’ end 2. First-round PCR product amplified using 2nd set of primers complementary to the adapter sequence (eg. P5 or index + P7) to add full sequencing adapter to amplicons

Answer 16

1. multiplex primers that only contain sequences targeting regions of interest 2. After PCR, amplification products ligated with adapters to create full library molecules for sequencing

Answer 17

**Benefits** - simpler workflow - smaller amounts DNA required - faster turnaround workflow **Drawbacks** - multiplex PCR for big regions very challenging and often requires longer development time

Answer 18

1. Depth of coverage 2. On-target rate 3. GC bias 4. Uniformity 5. Duplication Rate

Answer 19

- number of times base within target region is represented in sequencing data - expressed as a multiple (eg. 5X) **Avg coverage of a position** = (read count x read length) / total target size **Avg coverage of a variant base** = VAF x average coverage of a position - 50% VAF x 10 reads = 5 variant reads (5 wild type) - 10% VAF x 10 reads = 1 variant read (9 wild type)

Answer 20

1. quality and amount of input sample 2. number and type of variants 3. variant's expected frequencies 4. coverage depths typically reported for similar studies

Answer 21

measures specificity of your target enrichment method % On-Target Bases: - number of bases that map to the target region % Reads On-Target: - all sequencing reads that overlap with target region by at least one base

Answer 22

- suboptimal probe design - poorly optimized protocols - problem during the library preparation or enrichment process

Answer 23

- uneven coverage of AT and GC rich regions (GC content) during sequencing - use GC bias distribution plots - green dots: GC normalized experimental coverage (skewed in bad run) - blue bars: % of GC in 100-base windows of reference genome - can be from bad library preparation - can help determine if more sequencing required for desired sequencing depths across all target regions

Answer 24

reveals uneven coverage of sequencing regions **Fold-80 base penalty score** - describes how much more sequencing is required to bring 80% of target bases to mean coverage. - Fold-80 =1: uniform coverage - Fold-80 >1: uneven uniformity - Fold-80 =2: require 2x as much sequencing for 80% of reads to reach mean coverage

Answer 25

- fraction of mapped reads marked as duplicated reads in dataset - duplication causes inflation of coverage in certain regions - may overrepresent SNPs or false variant calls and inflate earlier frequency calculation - **deduplication**: removal of duplicate reads from sequencing data during bioinformatics process

Answer 26

1. Optical duplicates 2. ExAmp clustering 3. True biological duplication 4. library prep 5. PCR amplification

Answer 27

- due to instrument system error - 1 large cluster called as 2 clusters - Illumina (non-patterned flow cell) - Complete Genomics (large cluster or DNA ball)

Answer 28

- caused by underclustering on patterned flow cell (Illumina) - library molecule from 1st cluster are free to go back into solution and overflow to neighboring empty nanowells to create 2nd cluster

Answer 29

- happen to have both strands (sisters) of one double strand molecule in library - sister strands create two clusters with identical start and end positions that appear as duplicated reads - All Illumina platforms

Answer 30

- non-random fragmentation method = higher chance of getting two molecules with same ends - appear as duplicated reads but are from two different molecules - try to use larger DNA input samples and pair end sequencing - All Illumina platforms

Answer 31

- PCR copies original molecules, may get duplicates - try to reduce cycle number when possible - All Illumina platforms

Module 6.3 Genomic Variants Flashcards

(55 cards)