Next Generation Sequencing Flashcards

Question 1

Q

Typical design of a gene panel for NGS

Answer

A

Entire exonal sequence of the genes
+10 base pairs into intronic sequences (NOT deep intronic sequences)
Promoters are NOT covered (eg, TERT promoter)
Large indels (about 100 bp or more) are usually missed due to insufficient priming

Question 2

Q

Hotspot Panels

Answer

A

Focus on hot spot regions which are frequently associated with SNVs and small indels

Panels are not faster, but can be run on poorer quality / less DNA

Question 3

Q

NGS sequencing is run in. . .

Answer

A

. . . batches, to reduce costs.

Question 4

Q

Meta-mutational data

Answer

A

For example, MSI or UV signature – patterns of mutation

Require larger DNA sequence input/reading, since these are effectively statistical assays that require a large N.

Question 5

Q

Overestimation of tumor percentage risks . . .

Answer

A

. . . a false negative.

Question 6

Q

Evidence Tiers

Question 7

Q

Evidence tiers are primarily determined by. . .

Answer

A

. . . evidence type, not necessarily evidence quality

Question 8

Q

Assessing VAF

Question 9

Q

Sample isolation techniques

Question 10

Q

Emulsion PCR

Answer

A

PCR, but the aqueous phase is interrupted and spread across many individual cells within an oil emulsion.

Enables many parallel reactions to occur simultaneously.

This is the fundamental technique which produces the massively parallel component of next generation sequencing – many reactions are run in tiny emulsion chambers which in theory may contain different substrate and allow for numerous separate but simultaneous PCRs.

Question 11

Q

Amplification in emulsion PCR
(454 method)

Answer

A

The genome is fragmented through one of numerous possible techniques.

3’ overhangs are digested and 5’ overhangs are filled in to create a library of blunt dsDNA fragments. Then, A and B dsDNA adaptor sequences are added to the ends of each DNA fragment. A and B adaptors have 3’ hydroxyls, but lack 5’ phosphates (to prevent A-B pairing).

The non-ligated half of the dsDNA adaptor sequences are melted off and the overhangs are filled in by PCR. PCR enrichment then ensues with A’ and B’ primers, which selectively result in amplification of the library fragments (A-A and B-B form lariats, A- and B- only extend linearly, therefore only A-B amplifies).

Question 12

Q

Illumina method for emulsion PCR amplification

Answer

A

Rather than using separate A and B adapters which both lack a 5’ phosphate, Illumina utilizes the same Y-shaped adapters with a region of homology at the library dsDNA interface, but which then branch out into nonhomologous strands with sequences of A and B’.

When the first round of PCR takes place, you then get your full adaptor sequences connected to the first PCR product (A and B’, B and A’), and the PCR amplifies.

Question 13

Q

Emulsion PCR setup in 454 NGS (after library preparation)

Answer

A

Ideally, you have created a libary fragment:emulsion bubble ratio such that each bubble only contains one fragment at most – minimizing fragments with multiple samples.

Each bubble also contains a magnetic bead with the A’ primer for PCR, as well as a B’ primer which is free floating in solution.

Question 14

Q

Sequencing step in 454 NGS

Answer

A

NGS is a pyrosequencing-based approach. After library preparation and amplification, beads with attached libary amplification product are singly isolated into picoliter wells.

Pyrosequencing via a flow-based sequencing by synthesis is performed in each well.

When the correct nucleotide is flowed in, it is added to the strand and a pyrophosphate is released. The pyrophosphate is then utilize by ATP synthase to make ATP, which powers firefly luciferase to cleave luciferin and create a flash of light, indicating that this was the correct nucleotide. This occurs across millions of bound library amplification products bound to the same bead.

Question 15

Q

What is the rate limiting step of the 454 NGS sequencing phase?

Answer

A

The speed at which nucleotides are flowed into the picoliter wells.

Question 16

Q

What is the biggest challenge in the 454 NGS sequencing phase?

Answer

A

Quickly washing the wells to ensure that only one nucleotide is present at a time for sequencing by synthesis.

Question 17

Q

Why does the signal:noise ratio decrease with your position in 454 NGS?

Answer

A

Not every position on the bead will incorporate every time, and so more and more beads will be synthesizing out of sequence with the rest as time goes on.

This mostly creates problems with serial nucleotides over 5, since the variability in signal:noise ratio makes it difficult to precisely estimate the expected value of many serial nucleotides.

Question 18

Q

Ilumina PCR setup

Answer

A

Rather than emulsion PCR, Ilumina NGS performs its initial PCR amplification of library components by clustering.

A and B primers/adapters are covalently attached to tiles in an 8 lane glass microfluidics chamber. Library fragments with attached A and B’ sequences are flowed and will bind to the primers. They will then be amplified by an in-situ PCR, and separation of library components is effectively achieved by clustering at a site on the tile.

They are then prepared for sequencing in-situ.

Question 19

Q

Ilumina pyrosequencing step

Answer

A

The PCR amplified library (still bound to the glass in clusters on the fluidics chamber) is subjected to sequencing by synthesis.

One fluorophore-conjugated nucleotide at a time (with all four having different colors) is added, enabling identification of cluster-specific chains sequence.

Key differences between Ilumina and 454:
In Ilumina, each nucleotide has its own color.
In Ilumina, the 3’OH is protected and must be unprotected between each new nucleotide, making Ilumina better for detecting serial nucleotide chain lengths.

Question 20

Q

ABI Solid NGS Sequencing

Answer

A

ABI Solid utilizes emulsion PCR similar to 454, but unlike 545 the sequencing step is based on sequencing by ligation.

Synthesis by ligation relies on T4 ligase, and is based on colors associated with nucleotide pairs. The end result is that each base is read twice, giving you an idea of where errors have been made. This makes ABI solid NGS a much more accurate method, however it is longer and more expensive. It also requires a different type of software for analysis.

Also limited by very short average read length compared to sequencing by synthesis – 50 nucleotides vs >100 nucleotides.

Due to being a later method which is cumbersome and requires distinct software, it has never really caught on.

Question 21

Q

IT NGS Sequencing

Answer

A

Very similar to 454, however rather than measuring the pyrophosphate via ATP synthase and luciferase, it measures the H+ produced via a pH meter built into the glass well. Flow-based sequencing by synthesis. Referred to as “proton detection sequencing.”

Fast, but has the same error problems that 454 sequencing does. Also the coverage is at best ~1 gigabase.

Question 22

Q

MinPore NGS Sequencing

Answer

A

A form of sequencing by current differential which takes one single molecule of DNA at a time through a molecular ratchet attached to a charged chamber.

Has distinct signals for 3 nucleotides combinations at a time, so effectively 64 distinct signals and 3 reads of each base location.

Errors come in the form of false indels created by imperfect ratcheting. Error rate ~4%.

Only costs $900, pen-sized, plugs into a laptop via USB.

Question 23

Q

FASTA format

Answer

A

> name
ACTGATGACTGCC. . . .

Question 24

Q

FASTQ format

Answer

A

FASTA + quality score

Question 25

Q

Common start sites as a sign of bad NGS

Answer

A

If you have tons of reads starting at the same site, something has gone wrong. This is the result of artificial duplication of few reads due to PCR.

You were probably only targeting a small subset of your alleles due to some problem early on – maybe poor sample quality, poor DNA shearing, non-random shearing, etc.

Question 26

Q

What can you do to slavage a somewhat poor quality NGS?

Answer

A

The quality varries with position in the NGS fragment, and you can track with with the FASTAQ quality score.

So. . . you can just read a sequence until its quality score drops beneath a predicted error rate of a pre-determine percentage. This is called quality trimming.

This means that instead of 100 bases where the last 25 are gibberish, you get 75 quality bases. With the number of reads you usually have in NGS, that length is sufficient, so you can salvage the data into good quality with a slightly lower average read count.

Question 27

Q

Adaptor trimming

Answer

A

Sometimes the adaptor sequence is also read in shorter NGS fragments.

But, since you know your adaptor sequences, you can automatically filter these out.

Question 28

Q

Size trimming

Answer

A

Sometimes when you have short NGS fragments, post-trimming your total usable length is only ~15 base pairs.

This makes mapping a problem, as this can map to many locations in the genome.

So, you can just exclude these fragments entirely.

Question 29

Q

Two approaches to mapping sequences

Answer

A

De novo: Purely piecing sequences together – no reference sequence.

Resequencing: True mapping onto a reference genome sequence.

Question 30

Q

Multiplex NGS

Answer

A

Very commonly done in order to save money on NGS runs, and actually very simple.

Build a distinct bar code sequence into the adaptors for different patients. This will be read along with the sequence and can be used to identify which patient the fragment is coming from at the sacrifice of a small number of bp reads, say 10.

These sequences are read and then “trimmed” during the demultiplexing step, which should happen before any other trimming.

Question 31

Q

What reference genome should you be using?

Answer

A

hg19

(And in rare cases you may need to convert old data from hg18 to hg19, but try to stick to hg19)

Question 32

Q

dbSNP

Answer

A

SNP database that codes all benign SNPs

Important reference tool

Question 33

Q

When does NGS with RNA become less of an option?

Answer

A

7-8 years is when it starts to get questionable, but some cases 10 years out can have decent quality reads.

Question 34

Q

“RT check”

Answer

A

Checks that RNA quality is sufficient for reverse transcription.

The metric is a measure of GAPDH RNA.

Question 35

Q

Coverage that we deem adequate (rather than inadequate/conditional)

Answer

A

100 reads

Question 36

Q

Amplicon vs hybrid capture sequencing

Question 37

Q

Intrepreting FASTQ quality score

Answer

A

The Phred score is given in ASCII code in order to compress data and correspond to the position in the FAST sequence.

Question 38

Q

Full, formal variant reporting format (genone, cDNA, protein)

Question 39

Q

SAM/BAM

Answer

A

The standard output file of an algorithm alignment tool.

SAM = sequence alignment map
BAM = binary alignment map (binary equivalent of SAM)

Header - contains infformatioin about the data file, such as genome build version, reference sequence.

Sections are identified by @##, where ## is a two letter code to identify the type of data being entered.

Includes two quality metrics: The original FASTQ quality score AND the mapping quality score, both of which play different but important roles in variant calling.

Question 40

Q

The central dogma of NGS analysis

Question 41

Q

Variant reporting/tiering for somatic mutations

Question 42

Q

Important questions to know before beginning NGS analysis

Answer

A

DNA or RNA-based assay?
Is the quality sufficient (FASTQ quality score, alignment quality score, mean start sites, etc)?
What type of specimen was the material acquired from (precludes certain analyses)?

Question 43

Q

High expression artifact

Answer

A

Seen in RNA-based fusion assays

If a certain gene is expressed very highly, you will see a lot of artifactual “fusions” involving that gene.

Question 44

Q

Near-haploidization effect in oncocytic tumors

Answer

A

Multiple oncocytic tumors, including Hurthle cell adenocarcinoma, frequently display haploidization followed by reduplication, resulting in homozygosity of many genes.

Chromosome 7 is often unaffected by this process.