UNIT 1 Flashcards
1
Q
Sequencing Accuracy in NGS
A
- What is Sequencing Accuracy in NGS?
- Ability of an NGS platform to correctly identify each base (A, T, C, or G) in a DNA sequence.
- Measured by base call accuracy and read accuracy.
- Phred quality scores (Q scores) quantify accuracy (e.g., Q30 = 99.9% accuracy).
- Why is Sequencing Accuracy Crucial?
- Accurate variant calling (avoiding false positives and negatives).
- Reliable genome assembly (preventing misassemblies and gaps).
- Accurate expression profiling in RNA-seq (avoiding misquantification and spurious alignments).
- Accurate microbiome and metagenomic studies (correct taxonomic assignments and functional profiling).
- Clinical and diagnostic applications (avoiding misdiagnoses, inappropriate treatments, and non-compliance with regulatory standards).
- Factors Affecting Accuracy:
- Sequencing platform and technology
- Library preparation and amplification biases
- Read length
- Depth of coverage
- Error types (substitution errors, indel errors)
- Strategies to Improve Accuracy:
- Paired-end reads
- Consensus sequencing
- Error correction algorithms
- Improved library preparation
- Increased read coverage
- Quality control and trimming
- Implications of Low Accuracy:
- Incorrect biological conclusions
- Compromised clinical decisions
- Increased cost and time
- Impact on publication and reproducibility
2
Q
Why is Sequencing Accuracy Crucial?
A
- Accurate variant calling (avoiding false positives and negatives).
- Reliable genome assembly (preventing misassemblies and gaps).
- Accurate expression profiling in RNA-seq (avoiding misquantification and spurious alignments).
- Accurate microbiome and metagenomic studies (correct taxonomic assignments and functional profiling).
- Clinical and diagnostic applications (avoiding misdiagnoses, inappropriate treatments, and non-compliance with regulatory standards).
- Factors Affecting Accuracy:
- Sequencing platform and technology
- Library preparation and amplification biases
- Read length
- Depth of coverage
- Error types (substitution errors, indel errors)
- Strategies to Improve Accuracy:
- Paired-end reads
- Consensus sequencing
- Error correction algorithms
- Improved library preparation
- Increased read coverage
- Quality control and trimming
- Implications of Low Accuracy:
- Incorrect biological conclusions
- Compromised clinical decisions
- Increased cost and time
- Impact on publication and reproducibility
3
Q
Flashcard: Sequencing and Raw Sequence Data Quality Control in NGS
A
- Overview of NGS Sequencing
- NGS platforms (Illumina, Ion Torrent, PacBio, Oxford Nanopore)
- Sequencing workflow (library preparation, clustering/emulsion PCR, sequencing, data generation)
- Types of raw data (reads, quality scores, metadata)
- Data formats (FASTQ, BAM/SAM, CRAM)
- Importance of Quality Control
- Data integrity, error minimization, downstream analysis reliability, cost efficiency
- Steps in Raw Sequence Data Quality Control
- Initial assessment (quality score evaluation, read length distribution, GC content analysis)
- Trimming and filtering (adapter trimming, quality trimming, length filtering, contaminant removal)
- Duplicate removal
- Error correction
- Structural and content assessment (k-mer analysis, duplication rate, sequence duplication levels)
- Tools and Software
- FastQC, Trimmomatic, Cutadapt, PRINSEQ, BBDuk, MultiQC
- Best Practices
- Standardize QC pipelines
- Automate workflows
- Use multiple QC tools
- Establish quality thresholds
- Document and report QC results
- Continuously update methods
- Metrics and Quality Assessment Reports
- Per-base sequence quality
- Per-sequence quality scores
- Per-base sequence content
- Adapter content
- Sequence duplication levels
- K-mer content
- Challenges and Considerations
- Balancing trimming and data retention
- Handling diverse data types
- Scalability
- Interpretation of metrics
- Integration with downstream analysis
4
Q
NGS Quality Metrics and Control Measures
A
- Importance of Quality Control (QC) in NGS
- Ensures data accuracy and reliability
- Identifies potential issues early
- Reduces costs and time
- Facilitates reproducibility
- Key Quality Metrics
- Phred Quality Score (Q score)
- GC content
- Read length distribution
- Base composition
- Adapter content
- Duplicate reads
- Depth of coverage
- Mapping quality
- Error rate
- Tools for Assessing NGS Data Quality
- FastQC, MultiQC, SAMtools, Picard, Qualimap
- Control Measures
- Sample and library preparation controls (input quality check, control samples, PCR-free methods)
- Sequencing controls (spike-in controls, platform-specific quality metrics)
- Post-sequencing quality control (adapter trimming, read filtering)
- Alignment and post-alignment quality control (recalibration, duplication marking)
- Downstream analysis controls (variant validation, biological replicates, batch effect monitoring)
5
Q
Flashcard: First-Generation Sequencing (Sanger Sequencing)
A
- Sanger Sequencing
- Developed by Frederick Sanger in 1977
- Chain-termination method (dideoxy sequencing)
- Used for determining nucleotide sequences
- Laid the foundation for modern genomic studies
- Methodology
- DNA fragmentation and amplification
- Chain-termination reaction (dNTPs and ddNTPs)
- Fragment separation and detection (capillary electrophoresis, fluorescent detection)
- Workflow
- DNA extraction
- PCR amplification
- Sequencing reaction
- Capillary electrophoresis
- Data analysis
- Advantages
- High accuracy
- Long read lengths
- Established technology
- Cost-effective for small projects
- Limitations
- Low throughput
- High cost for large projects
- Lower sensitivity for low-frequency variants
- Limited coverage
- Applications
- Small-scale sequencing projects (single gene sequencing, PCR product sequencing)
- Sequencing of low-complexity genomes (bacterial, viral genomes)
- Validation of NGS results (variant validation, gene editing validation)
- DNA barcoding (species identification)
6
Q
Flashcard: Sanger Sequencing
A
- Sanger Sequencing
- Developed by Frederick Sanger in 1977
- Chain-termination method
- Used for determining nucleotide sequences
- Principles
- Selective incorporation of ddNTPs
- DNA replication termination
- Fragment separation and detection
- Components
- Template DNA
- Primer
- DNA polymerase
- dNTPs
- ddNTPs
- Process
- Template and primer binding
- Chain elongation and termination
- Fragment separation by capillary electrophoresis
- Detection of fluorescent signals
- Sequence determination
- Applications
- Single gene or PCR product sequencing
- Validation of NGS results
- Mitochondrial and viral genomes
- DNA barcoding
- Advantages
- High accuracy
- Long read lengths
- Established and reliable
- Cost-effective for small projects
- Limitations
- Low throughput
- Time-consuming and labor-intensive
- High cost for large projects
- Limited detection of low-frequency variants
- Advances
- Cycle sequencing
- Fluorescent dye terminators
7
Q
Applications of Sanger Sequencing
A
- Sanger Sequencing Applications
Back: - Clinical Diagnostics
* Mutation detection (cystic fibrosis, Huntington’s disease, BRCA1/2)
* Confirmatory testing
* Pharmacogenetics
- Clinical Diagnostics
- Molecular Biology Research
* Gene cloning and verification
* Targeted gene sequencing
* Study of small genomes
- Molecular Biology Research
- Microbial Identification and Phylogenetics
* DNA barcoding
* Phylogenetic studies
- Microbial Identification and Phylogenetics
- Validation of NGS Results
* Confirmation of variants
- Validation of NGS Results
- Mitochondrial DNA Sequencing
* Characterization of mtDNA variations
- Mitochondrial DNA Sequencing
- Forensic Science
* Personal identification, crime scene analysis, biological relationships
- Forensic Science
- Prenatal and Newborn Screening
* Screening for genetic conditions (phenylketonuria, sickle cell disease)
- Prenatal and Newborn Screening
- Limitations
- Low throughput
- High cost for large-scale projects
- Limited detection of low-frequency variants
- Shorter read depth
- Not suitable for whole-genome or exome sequencing
- Time-consuming
- Sequencing errors in homopolymer regions
8
Q
Next-Generation Sequencing (NGS)
A
- NGS
- High-throughput sequencing technology
- Revolutionized genomics
- Simultaneous sequencing of millions of fragments
- Key Differences from Sanger Sequencing
- Throughput (higher in NGS)
- Read length (longer in Sanger)
- Cost and time efficiency (lower in NGS)
- Accuracy (similar, but higher coverage in NGS)
- Applications (wider range in NGS)
- Rare variant detection (better in NGS)
- Scalability (better in NGS)
- Advantages of NGS
- High throughput
- Cost-effectiveness
- Rare variant detection
- High sensitivity
- Broad range of applications
- Data density
- Customizable
- Applications
- Whole-genome sequencing (WGS)
- Exome sequencing
- RNA sequencing (RNA-Seq)
- Targeted gene panels
- Microbiome analysis
- Cancer genomics
- Epigenetic studies
- Limitations
- Short read lengths (for some platforms)
- Complex data analysis
- High upfront cost
- Coverage variability
- Conclusion
- NGS has revolutionized genomics
- Offers powerful and cost-effective sequencing
- Wide range of applications
- Limitations exist, but remains a valuable tool
9
Q
Roche 454 Sequencing
A
- Roche 454 Sequencing
- One of the first NGS platforms
- Pyrosequencing technology
- Discontinued in 2016
- Mechanism
- Library preparation (fragmentation, adaptor ligation)
- Emulsion PCR (bead attachment, amplification)
- Sequencing-by-synthesis (pyrosequencing)
- Data analysis
- Strengths
- Long read lengths
- High throughput
- Low error rate in homopolymeric regions
- Real-time detection
- Weaknesses
- Homopolymer errors
- Cost
- Lower throughput compared to later NGS technologies
- Platform discontinuation
- Library preparation complexity
- Applications
- Microbial genome sequencing
- Amplicon sequencing
- Metagenomics
- Targeted resequencing
- Ancient DNA sequencing
- Transcriptomics
- Conclusion
- Pioneering NGS technology
- Valuable for specific applications
- Discontinued due to limitations
- Contributed to advancements in genomics
10
Q
Ion torrent sequnecing
A
- Ion Torrent Sequencing
- Semiconductor sequencing technology
- Developed by Ion Torrent Systems
- Acquired by Thermo Fisher Scientific
Back:
- Mechanism
- Library preparation (fragmentation, adaptor ligation)
- Emulsion PCR (bead binding, amplification)
- Loading on semiconductor chip
- Sequencing-by-synthesis (pH detection)
- Data processing and analysis
- Strengths
- Cost-effective
- Fast turnaround
- Scalable
- Small, accessible instrumentation
- No need for fluorescent labels
- Ideal for targeted sequencing
- Weaknesses
- Homopolymer errors
- Shorter read lengths
- Lower throughput
- Higher error rate in long reads
- Limited use in large genomes
- Chip quality affects results
- Applications
- Clinical diagnostics
- Targeted sequencing panels
- Microbial genomics and metagenomics
- Inherited disease research
- Forensic genomics
- Conclusion
- Innovative and cost-effective NGS technology
- Suitable for various applications
- Limitations exist, but remains valuable for specific tasks
11
Q
AB SOLiD Sequencing Technology
A
- AB SOLiD Sequencing
- Next-Generation Sequencing (NGS) platform
- Sequencing by oligonucleotide ligation and detection
- Developed by Applied Biosystems (Thermo Fisher Scientific)
- Mechanism
- Library preparation (fragmentation, adaptor ligation)
- Emulsion PCR (bead binding, amplification)
- Sequencing by ligation (probes, fluorescent labels, cleavage)
- Two-base encoding
- Data analysis (color-space data)
- Strengths
- High accuracy
- Low error rate
- Ability to detect variants
- Flexible read lengths
- Versatile applications
- Weaknesses
- Complex workflow
- Longer turnaround time
- Shorter read lengths
- Color-space complexity
- Reduced market presence
- Applications
- Whole-genome sequencing
- Exome sequencing
- RNA-Seq
- Cancer genomics
- Microbial and metagenomics
- Epigenetics
- Conclusion
- Highly accurate and reliable NGS platform
- Suitable for applications requiring precision
- Limitations in workflow and read length
- Reduced market presence compared to newer platforms
12
Q
Third-Generation Sequencing (TGS)
A
- TGS
- Revolutionized sequencing technology
- Longer reads, real-time sequencing, direct modification detection
- Major platforms: PacBio SMRT, Oxford Nanopore
- Key Features
- Single-molecule sequencing
- Long reads
- Real-time sequencing
- Direct detection of modifications
- PacBio SMRT Sequencing
- Mechanism: Zero-mode waveguides (ZMWs), fluorescently labeled nucleotides
- Strengths: ultra-long reads, real-time sequencing, high accuracy, epigenetic detection
- Weaknesses: higher error rate (initially), cost, lower throughput
- Oxford Nanopore Technologies (ONT)
- Mechanism: Nanopore sequencing, electrical current detection
- Strengths: extremely long reads, portability, real-time sequencing, epigenetic detection
- Weaknesses: higher error rate (historically), lower accuracy for short reads, complex data analysis
- Key Differences from Second-Generation Sequencing
- Read length (longer in TGS)
- Single-molecule sequencing (TGS)
- Real-time sequencing (TGS)
- Direct modification detection (TGS)
- Lower throughput but higher data volume (TGS)
- Advantages of TGS
- De novo genome assembly
- Structural variation detection
- Full-length transcript sequencing
- Epigenetics
- Field applications
- Limitations of TGS
- Higher error rates (historically)
- Cost
- Lower throughput
- Complex bioinformatics
- Data analysis challenges
- Conclusion
- Transformative technology
- Unique capabilities
- Valuable for specific applications
- Challenges to overcome
13
Q
PacBio SMRT Sequencing
A
- PacBio SMRT Sequencing
- Third-generation sequencing technology
- Single-molecule real-time sequencing
- Developed by Pacific Biosciences
- Mechanism
- Zero-mode waveguides (ZMWs)
- Fluorescently labeled nucleotides
- Real-time detection
- Circular consensus sequencing (CCS)
- Strengths
- Long read lengths
- High accuracy (HiFi reads)
- Epigenetic detection
- Real-time sequencing
- Limitations
- Higher cost
- Lower throughput
- Initial error rate (CLR)
- Applications
- De novo genome assembly
- Structural variant detection
- Transcriptome analysis
- Epigenetics
- Microbial genomics
- Clinical applications
- Agricultural and plant genomics
- Antibody and vaccine research
- Conclusion
- Powerful tool for genomics
- Unique advantages in long reads, accuracy, and epigenetic detection
- Suitable for complex applications
- Limitations in cost and throughput
14
Q
Oxford Nanopore Technology (ONT)
A
- ONT
- Third-generation sequencing technology
- Nanopore-based sequencing
- Real-time, long-read sequencing
- Mechanism
- Nanopores
- Membrane and applied voltage
- Single-molecule sequencing
- Real-time electrical signal detection
- Read length
- Workflow
- Library preparation
- Loading onto flow cell
- Sequencing
- Base calling and data analysis
- Types of Platforms
- MinION
- GridION
- PromethION
- Flongle
- SmidgION (in development)
- Unique Features
- Long reads
- Direct RNA sequencing
- Real-time data generation
- Portability
- Low cost
- Applications
- De novo genome assembly
- Structural variant detection
- Pathogen detection
- Microbial genomics
- RNA sequencing
- Clinical genetics
- Epigenetics
- Agricultural and plant genomics
- Conclusion
- Revolutionary technology
- Wide range of applications
- Advantages in long reads, real-time sequencing, and portability
- Challenges in error rates and data analysis
15
Q
Oxford Nanopore Technology (ONT)
A
- ONT
- Third-generation sequencing technology
- Nanopore-based sequencing
- Real-time, long-read sequencing
- Mechanism
- Nanopores
- Membrane and applied voltage
- Single-molecule sequencing
- Real-time electrical signal detection
- Read length
- Workflow
- Library preparation
- Loading onto flow cell
- Sequencing
- Base calling and data analysis
- Types of Platforms
- MinION
- GridION
- PromethION
- Flongle
- SmidgION (in development)
- Unique Features
- Long reads
- Direct RNA sequencing
- Real-time data generation
- Portability
- Low cost
- Applications
- De novo genome assembly
- Structural variant detection
- Pathogen detection
- Microbial genomics
- RNA sequencing
- Clinical genetics
- Epigenetics
- Agricultural and plant genomics
- Conclusion
- Revolutionary technology
- Wide range of applications
- Advantages in long reads, real-time sequencing, and portability
- Challenges in error rates and data analysis