NGS Terms and definitions Flashcards by Mary Cook

What is DNA/RNA Extraction?

The process of isolating DNA or RNA from cells or tissues to prepare it for sequencing. This is the first step because you need to clean genetic material to analyze.

How well did you know this?

Not at all

Perfectly

What are some common extraction methods in NGS?

Spin-column kits- use filters to trap DNA.
Organic extraction- using chemicals like phenol-chloroform

How well did you know this?

Not at all

Perfectly

Fragmentation

Breaking DNA or RNA into smaller pieces because most NGS machines cannot sequence long strands of DNA. This step ensures the sample is the right size for sequencing.

How well did you know this?

Not at all

Perfectly

Library Preperation

Modifying DNA fragments by attaching short synthetic DNA sequences to both ends. This makes the DNA compatible with the sequencing platform and allows the machine to identify each fragment.

How well did you know this?

Not at all

Perfectly

“indexing” or “barcoding”

A specific kind of adapter that contains a unique sequence (barcode) for each sample. This allows multiple samples to be sequenced together without mixing up their data- a process called “multiplexing”** - will hear this a lot.

How well did you know this?

Not at all

Perfectly

Cluster generation (ex- Ilumina Sequencing)

The process of amplifying (making copies) of DNA fragments on a special surface called a flow cell. Each cluster comes from a single DNA molecule and produces a stronger signal for reading.

in Ilumina sequencing- DNA fragments bind to a flow cell and are copied many times through a process called “bridge amplification” forming tiny, visible clusters.

How well did you know this?

Not at all

Perfectly

Sequencing by Synthesis (SBS)

This method, used by Ilumina is when fluorescently labeled nucleotides (A,T,C,G) are added one by one. A camera captures the color signal to record the sequence in real time.

Ex: Imagine reading a book letter by letter. The machine adds one nucleotide at a time and records the color (Green for A, Red for T, Blue for C, Yellow for G) and creates a digital DNA sequence.

How well did you know this?

Not at all

Perfectly

Paired-end Sequencing

A technique where both ends of each DNA fragment are sequenced. This provides more data and helps resolve complex areas like repetitive sequencing.

Ex. if you have 300 (base pair) fragment, the machine will read 150 base pair from each. This helps with better mapping to a reference genome and detecting structural changes.

How well did you know this?

Not at all

Perfectly

Single-end sequencing

only one end of a DNA fragment is sequenced. It’s faster and cheaper but provides less detailed information.

How well did you know this?

Not at all

Perfectly

Base Calling

The computational process of interpreting signals (fluorescent or electrical) from the sequencing machine and translating them into nucleotide sequences (A,T,C,G)

How well did you know this?

Not at all

Perfectly

Demultiplexing

Sorting the sequencing data back into individual samples using the unique barcodes that were added during library preparation

Ex: If you sequenced 10 different samples together, demultiplexing separates the data by matching reads to their unique bar codes.

How well did you know this?

Not at all

Perfectly

Quality control

Evaluating the quality of sequencing data to ensure it’s accurate and reliable. This step checks for errors like low read counts or bias.

How well did you know this?

Not at all

Perfectly

Alignment

mapping the sequence reads to a reference genome to identify where each piece of DNA comes from to help detect mutations or other changes.

How well did you know this?

Not at all

Perfectly

Variant calling

identifying genetic differences (variants) between the sample and the reference genome. This includes small changes (SNP’s) or larger structural alterations.

How well did you know this?

Not at all

Perfectly

Read depth (coverage)

how many times a specific DNA region is sequenced. Higher coverage increases accuracy and the chance of detecting rare mutations.

Ex: If a gene is sequenced 30 times (30x coverage). any errors are more likely to be corrected because the software compares multiple reads.

How well did you know this?

Not at all

Perfectly

Structural Variants (SV’s)

large-scale changes in the genome, such as insertions, deletions, or inversions. These can have major effects on gene function.

Ex: A 5,000bp deletion in a tumor supressor gene might drive cancer growth.

Illumina

Uses sequencing by synthesis (SBS) for highly accurate, short-read sequencing. Ideal for whole-genome, exome or transciptome studies.

Ex: NovaSeq for large-scale sequencing, MiSeq for small targeted projects.

Oxford Nanopore

Uses electrical. current changes to read DNA directly as it moves through a tiny pore. Provides ultra-long reads and real-time analysis.

Ex: Oxford Nanopores MiniON is a portable sequencer useful for fieldwork or rapid pathogen detection.

PacBio

Uses single molecule, Real-time (SMRT) sequencing to generate long, accurate reads. Useful for complex genomes and structural variant detection.

Ion Torrent

Measures pH changes during DNA synthesis. Faster but slightly less accurate. Suitable for targeted sequencing applications.

Whole genome sequencing

A comprehensive method that involves sequencing the entire DNA of an organism. It includes both the coding (exons) and non coding (introns) regions of the genome.

Purpose- provides the most complete picture of an individuals genome and is useful in studying rare genetic variations, mutations and discovering new genes.

Whole exome sequencing

Focuses only on the EXONS of the genome, which are the parts of the DNA that are transcribed into RNA and then translated into proteins. Only sequencing the coding regions (Exons) compared to the entire genome.

Whole transcriptome RNA-Seq

This is the analysis of all the RNA molecules (transcripts) present in a call or tissue at a given time. This covers both the coding (mRNA) and non-coding RNA (rRNA). It’s used to understand gene expression, alternative splicing and post-transcriptional modifications.

Key for each:

WGS- Covers the entire genome, including both coding and non-coding regions
WES- focuses only on the exons, the protein coding regions of the genome
RNA Seq- Examines the RNA molecules produced from the genome, focusing on gene expression.

Copy number variants (CNV's)

types of genetic variations where the number of copies of a particular gene or genomic regions differs from the normal number. In NGS, CNV's are detected by analyzing the read depth (coverage) of sequencing data, as well as patterns across the whole genome.

Types of CNV's

1. Deletions - a region of the genome is missing or deleted 2. Duplications - a region is duplicated, leading to extra copies of the gene or region 3. Insertions - more complicated structural variations that may involve both duplications and deletions.

What does "multiplexing" mean?

multiplexing involves tagging multiple DNA samples with unique barcodes (or index sequences) during library preparation, allowing them to be pooled and sequenced together on a single flow cell, and then separated for analysis based on these barcodes.

How does multiplexing work and what are the benefits?

How it Works: Barcoding/Indexing: During library preparation, each sample is assigned a unique barcode or index sequence. Pooling: The DNA libraries from different samples, each with its unique barcode, are pooled together. Sequencing: The pooled libraries are then sequenced on a single NGS flow cell. Demultiplexing: After sequencing, the resulting reads are separated and assigned to their respective samples based on the barcodes using bioinformatics tools. Benefits of Multiplexing: Reduced Costs: Sequencing multiple samples in a single run lowers the cost per sample. Increased Throughput: Multiplexing allows for higher sample throughput in a single sequencing run. Simplified Analysis: Barcodes facilitate automatic sample identification and separation during data analysis. Reduced Lane Effects: Multiplexing can help mitigate technical variations between sequencing lanes.