NGS Flashcards

Question 1

Q

What five areas need to be considered when assessing quality of NGS

Answer

A

Error rates of technology. Read length. Base calling algorithms. Alignment. Read depth coverage

Question 2

Q

In NGS what can contribute to error rates

Answer

A

Signal to noise ratio. Cross talk from nearby clusters or beads. Homopolymer counts. Incomplete extension. Position on the read ( worse at beginning or end).
Error rates typically: 1/10th% to several %

Question 3

Q

How does read length affect NGS quality

Answer

A

Too short a read and they might not be able to align correctly. Longer read lengths provide more information about relative genomic location but cost more. Paired end sequencing help align shorts reads and helps with rearrangements but is more expensive and time consuming.

Question 4

Q

What do base calling algorithms do in NGS

Answer

A

Identify bases and give them a quality score (phred score) based on noise estimates from image analysis. Can help improve error rates. The higher the phred score the better the quality.

Quality score important for rejecting low quality reads, trimming low quality bases, improving alignment accuracy, determine in consensus sequences

Question 5

Q

What must you be careful with base calling algorithms in NGS

Answer

A

They can remove real deletions. Therefore have to use special software designed for detecting deletions.

Question 6

Q

Whys alignment important and what are the issues in NGS

Answer

A

It’s important that alignment algorithms can cope with sequencing errors and real differences. Alignment is more difficult than Sanger due to the short reads. Paired end sequencing contributes to an increase in matched reads. Issues in repetitive regions/ shared homology.
Must produce well calibrated alignment quality values.

Question 7

Q

What is depth coverage in NGS

Answer

A

Measurement of the number of times a region has been sequenced during a run. Higher number of reads- the higher the data quality.covearge across regions are variable.

Inadequate coverage can result in a false negative result (miss real SNV). >30fold coverage is recommended. If this isn’t reached the nt needs repeating (Sanger).

Question 8

Q

What’s a FASTQ file

Answer

A

A file that contains all base calls and quality scores.

Question 9

Q

What’s a BAM file

Answer

A

A map file that enables the bases to be aligned to the reference genome

Question 10

Q

What’s a VCF file

Answer

A

A text file that contains information about known variants for comparing the patient to reference genome.

Question 11

Q

Give a basic overview of what needs to be checked for accurate detection of SNV in NGS

Answer

A

1) data must be aligned correctly. 2) alignment quality needs to be checked. 3) coverage of every base needs to be checked (>30x). 4) variant detection is performed. 5) check each base quality (phred) score. 6) check % reads the variant is seen in to determine real vs sequencing error (a threshold must be established).

Question 12

Q

What must be considered when assessing if a variant call is real in NGS

Answer

A

% times SNV appears in the forward and reverse strands. % times SNV called vs wild type. NTure of SNV (eg in a homopolymer region).

Question 13

Q

What’s a homopolymer

Answer

A

A stretch/run of the same base eg AAAAAA

Question 14

Q

Name 3 causes of error in NGS

Answer

A

Base calling errors. Alignment errors. Low coverage.

Question 15

Q

For a targeted NGS what has to happen before sequencing

Answer

A

An enrichment step. Either PCR based or hybridisation based

Question 16

Q

Discuss PCR based enrichment technologies for NGS

Answer

A

Requires a small starting amount of DNA. It’s cheap. Products will contain unwanted introns. Originally long range PCR performed. Now multiplexed enrichement kits.

Nextera (Illumina). 1) tagmentation (transposons simultaneously fragment and tag the DNA with adapters). 2) reduced cycle amplification (adds more motifs to fragments).

Fluidigm access array. 1) hybridisation sequence specific binder to DNA (primer contains universal tag sequence which allows binding of …) 2) annealing of barcode primer (contains a capture sequence appropriate for seq tech). 3) final applicants has barcode seq, pt ID, and is tagged for capture

Question 17

Q

What tags need to be added the the fragments DNA in library prep stage of NGS

Answer

A

Index sequence to ID pt sample. A primer site for the sequencing primer to anneal to. Capture sequence complementary to the sequencing technology for binding to the cell.

Question 18

Q

Discuss hybridisation enrichment methods for NGS

Answer

A

Based on capture of target regions. Fragmentation of DNA, tagging of DNA, capture of fragments using a RNA or DNA library.

Question 19

Q

Describe how sure select works

Answer

A

Sure select (Agilent). 1) shear DNA to produce sequence ready DNA. 2) prepare a biotinylated library of 120mer RNA baits of the RoI with adapter MIDs. 3) hybridise together. 4) separate out hybridised regions using sterptavidin beads and magnets. 5) wash beads and disgest RNA. Prep ready for sequencing.

Question 20

Q

Describe how haloplex works

Answer

A

Agilent. Involves an initial restriction endonuclease step.

1) digest and denature DNA (6 digests using different REs). 2) prepare probe library (biotinylated probe consisting of a universal primer site, a sequencing primer motif, an index for pt ID and sequences corresponding the the ER sites). 3) hybridise probe library to fragmented DNA (probe designed to bind to both ends of the fragmented DNA resulting in circular DNA). 4) purify and ligate (purify using streptavidin and magnets, than close the circular DNA by ligation. 5) amplify enriched fragments (PCR using a universal barcoded primer that amplifying the circular DNA producing linear tagged fragments ready for sequencing.

Question 21

Q

Describe bridge amplification in the Illumina NGS platform

Answer

A

careful quantification of the concentration of the library required*
1) the template hybridises to the immobilised adapter region on the flow cell (p7). 2) initial extension results in a ds strand attached to the flow. 3) dsDNA is denatured removing the template DNA - leaves sequence attached to the flow. 4) the sequence then folds over and annuals to the complementary adapter sequence (p5) forming a bridgework. 5) 1st cycle extension results in a dsDNA bridged. 6) 2nd cycle denaturation results in two ssDNA strands (forward and reverse- one attached to p7 and one attached to p5). 7) cycle repeated x35 (folding, annealing, denaturing). 8) cluster is now formed ready for sequencing.

Question 22

Q

Describe NGS process for Illumina MiSeq, HiSeq, NextSeq

Answer

A

reverse terminator sequencing is carried out:
A) forward strand read, b) indexes are read, c) reverse strand is read.

1) forward strand sequenced first so the reverse strands are cleaved and washed off. 2) 3’ end of strands are chemically blocked (to prevent folding over) and primed. 3) all 4 Flourescently tagged nucleotides are added at once and are provided each cycle. A single nucleotide extension occurs as there’s a blocking group at 3’OH of ribose. 4) all unincorporated nucleotides are washed away. 5) flow cell illuminated and each clusters fluorescent signal recorded. 6) fluorescent group is cleaved from nucleotide. 7) the 3’ OH is unblocked, allowing a further nucleotide to be added. 8) cycle is repeated for every nucleotide added.

Question 23

Q

Describe emulsion PCR required for the ion torrent NGS platform

Answer

A

The library molecules are clonally amplified onto beads in spheres. Spheres produced using water and oil. Each sphere containers 1 bead, 1 molecule, reagents required for amplification.
Each sphere has probes attached that are complementary to the adapters of the library molecule. The molecular is amplified and attached to the bead.

Question 24

Q

Describe sequencing using the ion torrent

Answer

A

Emulsion beads are broken and cleaned up and the individual beads are loaded into the sensor wells by centrifugation.
Chip: high density array of micro wells. Beneath each well is an ion-sensitive layer and an ion sensor (pH meter).

1) The nucleotides (non Flourescently labeled) are added in order. 2) incorporation into the chain results in hydrolysis of the nucleotide triphisphate and net release of a H+ ion. 3) release of the H+ ion results in a shift of the pH of the surrounding solution that’s PROPORTIONAL to the number of nucleotides incorporated. (0.02pH units/nt). 4) pH change is detected by a semiconductor sensor , converted into voltage an digitalised.

After each flow of nucleotides, a wash step ensures nucleotides don’t stay in the wells. Due to the small size of the wells diffusion into and out of the wells is at 1/10per sec so there no need for enzymatic removal of reagents.

Question 25

Q

Name the advantages of whole exome sequencing

Answer

A

Targeted to 2% of genome that’s coding that has ~85% disease causing mutations. Looking at less of genome means it’s cheaper, produces less data for storage, less analysis time. Can analyse more sample (cheaper/quicker/multiplex). Don’t get data fatigue from looking at so much information.

Question 26

Q

What’s are some advantages of whole genome sequencing

Answer

A

Examination of entire genome, including non-coding regions. Can examine for indels, CNVs, SNVs. Has a more uniform coverage. PCR amplification step not required- no bias in GC rich areas or at heterozygous sites that could cause a false result).

Question 27

Q

What are some of the diagnostic benefits of exome sequencing

Answer

A

Accurate diagnosis of pts with Mendelian disorders, with atypical manifestations, has symptoms shared among several disorders, has a disorder with a long list of candidate genes eg Charcot Marie Tooth

Question 28

Q

Why carry out a targeted NGS approach to analysis

Answer

A

Cheaper. Coverage of RoI are better (can fill gaps with Sanger). Can interprets and fully report findings as in known genes. Reduced the number of VUS findings.

Question 29

Q

What’s a virtual gene panel

Answer

A

Where all the exome is sequenced, or a larger number of genes are sequenced together, but only a cohort are analysed per patient, depending on referral reason. Uses bioinformatics to only show the relevant genes for that gene panel. Allows a greater degree of flexibility as additional,genes can be ‘added’ to the panel without the need for re validation/ development of new panel by manufacturer. But the increase in breadth of genes reduces the depth at which each is covered.

Question 30

Q

Describe a clinical exome

Answer

A

Only genes with a known disease relationship included (but entire exome, not just associated with one disease)

Question 31

Q

What’s third generation sequencing

Answer

A

Sequencing of single molecules of DNA without the need to halt between read steps (whether enzymatic or other). It’s removes the need for production of clusters and therefore there are no synchronisation problems).

Question 32

Q

Name two third generation platforms

Answer

A

Pacific bio SMRT: single molecular sequencing in real time.

Oxford nano pore

Question 33

Q

Describe Pacific bio SMRT

Answer

A

Sequencing by synthesis. Realtime imaging.
uses a DNA polymerase anchored to the bottom surface a well. Diff Fluor labelled nucleotides enter the Well via diffusion. During incorporation, the labeled nucleotide is ‘held’ within the detection volume by the polymerase for tens of milliseconds. As each nucleotide is incorporated, the label, located on the terminal phosphate, is cleaved off and diffuses out of the Well

Produce really long reads (30-200x longer then 2nd gen)

Question 34

Q

Describe the Oxford nana pore process

Answer

A

The platform uses an exonuclease coupled to a modified α-hemolysin nanopore positioned within a lipid bilayer. As sequentially cleaved bases are directed through the nanopore, they are transiently bound by a cyclodextrin moiety. This disturbs the current through the nanopore in a manner characteristic for each base.

Question 35

Q

What is RNA-seq and what’s its key aims

Answer

A

Transcriptome profiling using deep sequencing technology. Aims: 1) to catalogue all species of transcript (mRNA, ncRNA). 2) determine transcriptional structure of genes (start sites, 5’ and 3’ ends, splicing patterns, other post transcription modifications). 3) quantify changing expression levels of each transcipt during development and unclear different conditions.

Question 36

Q

What’s a transcriptome

Answer

A

Complete set of transcripts and quantity in a cell for a specific developmental stage or physiological condition.

Question 37

Q

From BPG what sources should you use to determine the clinical significance of a VUS

Answer

A

RNA studies, LOH studies, in silico predictions, functional studies, co-occurrence with a known deleterious variant in same gene,co-segregation with disease in a family, species conversion, testing matched controls, sporadic, literature/databases.

Question 38

Q

What are the general considerations when using an external database

Answer

A

Accuracy of data ( normal population studies- is everyone normal?/ where is the data from/ has it been curated). Patient consent. Intellectual property rights of info. Adequate bio statistical support. Amount of data in database. Is it being continually updated. Cost of obtaining licence/access.

Ongoing large scale projects: 1000genomes. DDD project. NHLBI Exome seq project.

Question 39

Q

Name two population (normal) databases (CNV and SNV)

Answer

A

DGV. dbSNP. DbVar. ExAC. 1000 genome project. NHLBI Exome seq project.

Question 40

Q

Name some disease databases

Answer

A

DECIPHER. ClinVar. OMIM. DMuDB. HGMD

Question 41

Q

What are the general areas to consider when designing a NGS gene panel

Answer

A

Type of target enrichment. Gene/transcript selection. BED file. DNA quality. Barcoding of samples. Subpanels. IQC and EQA. How to and if to confirm variants. BPG. report structure. Cost. Validation (reproducibility, sensitivity and specificity). Bioinformatics pipeline (variant calling and filtering and annotation)

Question 42

Q

From BPG, in terms of data storage what is it essential that’s kept

Answer

A

Essential to store output file from the variant annotation step eg VCF and some labs may also retain the FASTQ and BAM files in order to analyse the read data in the future.

Must also keep a log of what bioinformatics processing was applied to the raw data to make the files.