NGS Terms and definitions Flashcards
What is DNA/RNA Extraction?
The process of isolating DNA or RNA from cells or tissues to prepare it for sequencing. This is the first step because you need to clean genetic material to analyze.
What are some common extraction methods in NGS?
- Spin-column kits- use filters to trap DNA.
- Organic extraction- using chemicals like phenol-chloroform
Fragmentation
Breaking DNA or RNA into smaller pieces because most NGS machines cannot sequence long strands of DNA. This step ensures the sample is the right size for sequencing.
Library Preperation
Modifying DNA fragments by attaching short synthetic DNA sequences to both ends. This makes the DNA compatible with the sequencing platform and allows the machine to identify each fragment.
“indexing” or “barcoding”
A specific kind of adapter that contains a unique sequence (barcode) for each sample. This allows multiple samples to be sequenced together without mixing up their data- a process called “multiplexing”** - will hear this a lot.
Cluster generation (ex- Ilumina Sequencing)
The process of amplifying (making copies) of DNA fragments on a special surface called a flow cell. Each cluster comes from a single DNA molecule and produces a stronger signal for reading.
in Ilumina sequencing- DNA fragments bind to a flow cell and are copied many times through a process called “bridge amplification” forming tiny, visible clusters.
Sequencing by Synthesis (SBS)
This method, used by Ilumina is when fluorescently labeled nucleotides (A,T,C,G) are added one by one. A camera captures the color signal to record the sequence in real time.
Ex: Imagine reading a book letter by letter. The machine adds one nucleotide at a time and records the color (Green for A, Red for T, Blue for C, Yellow for G) and creates a digital DNA sequence.
Paired-end Sequencing
A technique where both ends of each DNA fragment are sequenced. This provides more data and helps resolve complex areas like repetitive sequencing.
Ex. if you have 300 (base pair) fragment, the machine will read 150 base pair from each. This helps with better mapping to a reference genome and detecting structural changes.
Single-end sequencing
only one end of a DNA fragment is sequenced. It’s faster and cheaper but provides less detailed information.
Base Calling
The computational process of interpreting signals (fluorescent or electrical) from the sequencing machine and translating them into nucleotide sequences (A,T,C,G)
Demultiplexing
Sorting the sequencing data back into individual samples using the unique barcodes that were added during library preparation
Ex: If you sequenced 10 different samples together, demultiplexing separates the data by matching reads to their unique bar codes.
Quality control
Evaluating the quality of sequencing data to ensure it’s accurate and reliable. This step checks for errors like low read counts or bias.
Alignment
mapping the sequence reads to a reference genome to identify where each piece of DNA comes from to help detect mutations or other changes.
Variant calling
identifying genetic differences (variants) between the sample and the reference genome. This includes small changes (SNP’s) or larger structural alterations.
Read depth (coverage)
how many times a specific DNA region is sequenced. Higher coverage increases accuracy and the chance of detecting rare mutations.
Ex: If a gene is sequenced 30 times (30x coverage). any errors are more likely to be corrected because the software compares multiple reads.
Structural Variants (SV’s)
large-scale changes in the genome, such as insertions, deletions, or inversions. These can have major effects on gene function.
Ex: A 5,000bp deletion in a tumor supressor gene might drive cancer growth.
Illumina
Uses sequencing by synthesis (SBS) for highly accurate, short-read sequencing. Ideal for whole-genome, exome or transciptome studies.
Ex: NovaSeq for large-scale sequencing, MiSeq for small targeted projects.
Oxford Nanopore
Uses electrical. current changes to read DNA directly as it moves through a tiny pore. Provides ultra-long reads and real-time analysis.
Ex: Oxford Nanopores MiniON is a portable sequencer useful for fieldwork or rapid pathogen detection.
PacBio
Uses single molecule, Real-time (SMRT) sequencing to generate long, accurate reads. Useful for complex genomes and structural variant detection.
Ion Torrent
Measures pH changes during DNA synthesis. Faster but slightly less accurate. Suitable for targeted sequencing applications.
Whole genome sequencing
A comprehensive method that involves sequencing the entire DNA of an organism. It includes both the coding (exons) and non coding (introns) regions of the genome.
Purpose- provides the most complete picture of an individuals genome and is useful in studying rare genetic variations, mutations and discovering new genes.
Whole exome sequencing
Focuses only on the EXONS of the genome, which are the parts of the DNA that are transcribed into RNA and then translated into proteins. Only sequencing the coding regions (Exons) compared to the entire genome.
Whole transcriptome RNA-Seq
This is the analysis of all the RNA molecules (transcripts) present in a call or tissue at a given time. This covers both the coding (mRNA) and non-coding RNA (rRNA). It’s used to understand gene expression, alternative splicing and post-transcriptional modifications.
Key for each:
WGS- Covers the entire genome, including both coding and non-coding regions
WES- focuses only on the exons, the protein coding regions of the genome
RNA Seq- Examines the RNA molecules produced from the genome, focusing on gene expression.