Next Generation Sequencing Flashcards

1
Q

Typical design of a gene panel for NGS

A
  • Entire exonal sequence of the genes
  • +10 base pairs into intronic sequences (NOT deep intronic sequences)
  • Promoters are NOT covered (eg, TERT promoter)
  • Large indels (about 100 bp or more) are usually missed due to insufficient priming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hotspot Panels

A

Focus on hot spot regions which are frequently associated with SNVs and small indels

Panels are not faster, but can be run on poorer quality / less DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NGS sequencing is run in. . .

A

. . . batches, to reduce costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Meta-mutational data

A

For example, MSI or UV signature – patterns of mutation

Require larger DNA sequence input/reading, since these are effectively statistical assays that require a large N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Overestimation of tumor percentage risks . . .

A

. . . a false negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Evidence Tiers

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Evidence tiers are primarily determined by. . .

A

. . . evidence type, not necessarily evidence quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assessing VAF

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample isolation techniques

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Emulsion PCR

A

PCR, but the aqueous phase is interrupted and spread across many individual cells within an oil emulsion.

Enables many parallel reactions to occur simultaneously.

This is the fundamental technique which produces the massively parallel component of next generation sequencing – many reactions are run in tiny emulsion chambers which in theory may contain different substrate and allow for numerous separate but simultaneous PCRs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amplification in emulsion PCR
(454 method)

A

The genome is fragmented through one of numerous possible techniques.

3’ overhangs are digested and 5’ overhangs are filled in to create a library of blunt dsDNA fragments. Then, A and B dsDNA adaptor sequences are added to the ends of each DNA fragment. A and B adaptors have 3’ hydroxyls, but lack 5’ phosphates (to prevent A-B pairing).

The non-ligated half of the dsDNA adaptor sequences are melted off and the overhangs are filled in by PCR. PCR enrichment then ensues with A’ and B’ primers, which selectively result in amplification of the library fragments (A-A and B-B form lariats, A- and B- only extend linearly, therefore only A-B amplifies).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Illumina method for emulsion PCR amplification

A

Rather than using separate A and B adapters which both lack a 5’ phosphate, Illumina utilizes the same Y-shaped adapters with a region of homology at the library dsDNA interface, but which then branch out into nonhomologous strands with sequences of A and B’.

When the first round of PCR takes place, you then get your full adaptor sequences connected to the first PCR product (A and B’, B and A’), and the PCR amplifies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Emulsion PCR setup in 454 NGS (after library preparation)

A

Ideally, you have created a libary fragment:emulsion bubble ratio such that each bubble only contains one fragment at most – minimizing fragments with multiple samples.

Each bubble also contains a magnetic bead with the A’ primer for PCR, as well as a B’ primer which is free floating in solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sequencing step in 454 NGS

A

NGS is a pyrosequencing-based approach. After library preparation and amplification, beads with attached libary amplification product are singly isolated into picoliter wells.

Pyrosequencing via a flow-based sequencing by synthesis is performed in each well.

When the correct nucleotide is flowed in, it is added to the strand and a pyrophosphate is released. The pyrophosphate is then utilize by ATP synthase to make ATP, which powers firefly luciferase to cleave luciferin and create a flash of light, indicating that this was the correct nucleotide. This occurs across millions of bound library amplification products bound to the same bead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the rate limiting step of the 454 NGS sequencing phase?

A

The speed at which nucleotides are flowed into the picoliter wells.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the biggest challenge in the 454 NGS sequencing phase?

A

Quickly washing the wells to ensure that only one nucleotide is present at a time for sequencing by synthesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why does the signal:noise ratio decrease with your position in 454 NGS?

A

Not every position on the bead will incorporate every time, and so more and more beads will be synthesizing out of sequence with the rest as time goes on.

This mostly creates problems with serial nucleotides over 5, since the variability in signal:noise ratio makes it difficult to precisely estimate the expected value of many serial nucleotides.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ilumina PCR setup

A

Rather than emulsion PCR, Ilumina NGS performs its initial PCR amplification of library components by clustering.

A and B primers/adapters are covalently attached to tiles in an 8 lane glass microfluidics chamber. Library fragments with attached A and B’ sequences are flowed and will bind to the primers. They will then be amplified by an in-situ PCR, and separation of library components is effectively achieved by clustering at a site on the tile.

They are then prepared for sequencing in-situ.

19
Q

Ilumina pyrosequencing step

A

The PCR amplified library (still bound to the glass in clusters on the fluidics chamber) is subjected to sequencing by synthesis.

One fluorophore-conjugated nucleotide at a time (with all four having different colors) is added, enabling identification of cluster-specific chains sequence.

Key differences between Ilumina and 454:
In Ilumina, each nucleotide has its own color.
In Ilumina, the 3’OH is protected and must be unprotected between each new nucleotide, making Ilumina better for detecting serial nucleotide chain lengths.

20
Q

ABI Solid NGS Sequencing

A

ABI Solid utilizes emulsion PCR similar to 454, but unlike 545 the sequencing step is based on sequencing by ligation.

Synthesis by ligation relies on T4 ligase, and is based on colors associated with nucleotide pairs. The end result is that each base is read twice, giving you an idea of where errors have been made. This makes ABI solid NGS a much more accurate method, however it is longer and more expensive. It also requires a different type of software for analysis.

Also limited by very short average read length compared to sequencing by synthesis – 50 nucleotides vs >100 nucleotides.

Due to being a later method which is cumbersome and requires distinct software, it has never really caught on.

21
Q

IT NGS Sequencing

A

Very similar to 454, however rather than measuring the pyrophosphate via ATP synthase and luciferase, it measures the H+ produced via a pH meter built into the glass well. Flow-based sequencing by synthesis. Referred to as “proton detection sequencing.”

Fast, but has the same error problems that 454 sequencing does. Also the coverage is at best ~1 gigabase.

22
Q

MinPore NGS Sequencing

A

A form of sequencing by current differential which takes one single molecule of DNA at a time through a molecular ratchet attached to a charged chamber.

Has distinct signals for 3 nucleotides combinations at a time, so effectively 64 distinct signals and 3 reads of each base location.

Errors come in the form of false indels created by imperfect ratcheting. Error rate ~4%.

Only costs $900, pen-sized, plugs into a laptop via USB.

23
Q

FASTA format

A

> name
ACTGATGACTGCC. . . .

24
Q

FASTQ format

A

FASTA + quality score

25
Q

Common start sites as a sign of bad NGS

A

If you have tons of reads starting at the same site, something has gone wrong. This is the result of artificial duplication of few reads due to PCR.

You were probably only targeting a small subset of your alleles due to some problem early on – maybe poor sample quality, poor DNA shearing, non-random shearing, etc.

26
Q

What can you do to slavage a somewhat poor quality NGS?

A

The quality varries with position in the NGS fragment, and you can track with with the FASTAQ quality score.

So. . . you can just read a sequence until its quality score drops beneath a predicted error rate of a pre-determine percentage. This is called quality trimming.

This means that instead of 100 bases where the last 25 are gibberish, you get 75 quality bases. With the number of reads you usually have in NGS, that length is sufficient, so you can salvage the data into good quality with a slightly lower average read count.

27
Q

Adaptor trimming

A

Sometimes the adaptor sequence is also read in shorter NGS fragments.

But, since you know your adaptor sequences, you can automatically filter these out.

28
Q

Size trimming

A

Sometimes when you have short NGS fragments, post-trimming your total usable length is only ~15 base pairs.

This makes mapping a problem, as this can map to many locations in the genome.

So, you can just exclude these fragments entirely.

29
Q

Two approaches to mapping sequences

A

De novo: Purely piecing sequences together – no reference sequence.

Resequencing: True mapping onto a reference genome sequence.

30
Q

Multiplex NGS

A

Very commonly done in order to save money on NGS runs, and actually very simple.

Build a distinct bar code sequence into the adaptors for different patients. This will be read along with the sequence and can be used to identify which patient the fragment is coming from at the sacrifice of a small number of bp reads, say 10.

These sequences are read and then “trimmed” during the demultiplexing step, which should happen before any other trimming.

31
Q

What reference genome should you be using?

A

hg19

(And in rare cases you may need to convert old data from hg18 to hg19, but try to stick to hg19)

32
Q

dbSNP

A

SNP database that codes all benign SNPs

Important reference tool

33
Q

When does NGS with RNA become less of an option?

A

7-8 years is when it starts to get questionable, but some cases 10 years out can have decent quality reads.

34
Q

“RT check”

A

Checks that RNA quality is sufficient for reverse transcription.

The metric is a measure of GAPDH RNA.

35
Q

Coverage that we deem adequate (rather than inadequate/conditional)

A

100 reads

36
Q

Amplicon vs hybrid capture sequencing

A
37
Q

Intrepreting FASTQ quality score

A

The Phred score is given in ASCII code in order to compress data and correspond to the position in the FAST sequence.

38
Q

Full, formal variant reporting format (genone, cDNA, protein)

A
39
Q

SAM/BAM

A

The standard output file of an algorithm alignment tool.

SAM = sequence alignment map
BAM = binary alignment map (binary equivalent of SAM)

Header - contains infformatioin about the data file, such as genome build version, reference sequence.

Sections are identified by @##, where ## is a two letter code to identify the type of data being entered.

Includes two quality metrics: The original FASTQ quality score AND the mapping quality score, both of which play different but important roles in variant calling.

40
Q

The central dogma of NGS analysis

A
41
Q

Variant reporting/tiering for somatic mutations

A
42
Q

Important questions to know before beginning NGS analysis

A
  1. DNA or RNA-based assay?
  2. Is the quality sufficient (FASTQ quality score, alignment quality score, mean start sites, etc)?
  3. What type of specimen was the material acquired from (precludes certain analyses)?
43
Q

High expression artifact

A

Seen in RNA-based fusion assays

If a certain gene is expressed very highly, you will see a lot of artifactual “fusions” involving that gene.

44
Q

Near-haploidization effect in oncocytic tumors

A

Multiple oncocytic tumors, including Hurthle cell adenocarcinoma, frequently display haploidization followed by reduplication, resulting in homozygosity of many genes.

Chromosome 7 is often unaffected by this process.