UNIT 1 Flashcards

1
Q

Sequencing Accuracy in NGS

A
  • What is Sequencing Accuracy in NGS?
    • Ability of an NGS platform to correctly identify each base (A, T, C, or G) in a DNA sequence.
    • Measured by base call accuracy and read accuracy.
    • Phred quality scores (Q scores) quantify accuracy (e.g., Q30 = 99.9% accuracy).
  • Why is Sequencing Accuracy Crucial?
    • Accurate variant calling (avoiding false positives and negatives).
    • Reliable genome assembly (preventing misassemblies and gaps).
    • Accurate expression profiling in RNA-seq (avoiding misquantification and spurious alignments).
    • Accurate microbiome and metagenomic studies (correct taxonomic assignments and functional profiling).
    • Clinical and diagnostic applications (avoiding misdiagnoses, inappropriate treatments, and non-compliance with regulatory standards).
  • Factors Affecting Accuracy:
    • Sequencing platform and technology
    • Library preparation and amplification biases
    • Read length
    • Depth of coverage
    • Error types (substitution errors, indel errors)
  • Strategies to Improve Accuracy:
    • Paired-end reads
    • Consensus sequencing
    • Error correction algorithms
    • Improved library preparation
    • Increased read coverage
    • Quality control and trimming
  • Implications of Low Accuracy:
    • Incorrect biological conclusions
    • Compromised clinical decisions
    • Increased cost and time
    • Impact on publication and reproducibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is Sequencing Accuracy Crucial?

A
  • Accurate variant calling (avoiding false positives and negatives).
    • Reliable genome assembly (preventing misassemblies and gaps).
    • Accurate expression profiling in RNA-seq (avoiding misquantification and spurious alignments).
    • Accurate microbiome and metagenomic studies (correct taxonomic assignments and functional profiling).
    • Clinical and diagnostic applications (avoiding misdiagnoses, inappropriate treatments, and non-compliance with regulatory standards).
  • Factors Affecting Accuracy:
    • Sequencing platform and technology
    • Library preparation and amplification biases
    • Read length
    • Depth of coverage
    • Error types (substitution errors, indel errors)
  • Strategies to Improve Accuracy:
    • Paired-end reads
    • Consensus sequencing
    • Error correction algorithms
    • Improved library preparation
    • Increased read coverage
    • Quality control and trimming
  • Implications of Low Accuracy:
    • Incorrect biological conclusions
    • Compromised clinical decisions
    • Increased cost and time
    • Impact on publication and reproducibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Flashcard: Sequencing and Raw Sequence Data Quality Control in NGS

A
  • Overview of NGS Sequencing
    • NGS platforms (Illumina, Ion Torrent, PacBio, Oxford Nanopore)
    • Sequencing workflow (library preparation, clustering/emulsion PCR, sequencing, data generation)
    • Types of raw data (reads, quality scores, metadata)
    • Data formats (FASTQ, BAM/SAM, CRAM)
  • Importance of Quality Control
    • Data integrity, error minimization, downstream analysis reliability, cost efficiency
  • Steps in Raw Sequence Data Quality Control
    • Initial assessment (quality score evaluation, read length distribution, GC content analysis)
    • Trimming and filtering (adapter trimming, quality trimming, length filtering, contaminant removal)
    • Duplicate removal
    • Error correction
    • Structural and content assessment (k-mer analysis, duplication rate, sequence duplication levels)
  • Tools and Software
    • FastQC, Trimmomatic, Cutadapt, PRINSEQ, BBDuk, MultiQC
  • Best Practices
    • Standardize QC pipelines
    • Automate workflows
    • Use multiple QC tools
    • Establish quality thresholds
    • Document and report QC results
    • Continuously update methods
  • Metrics and Quality Assessment Reports
    • Per-base sequence quality
    • Per-sequence quality scores
    • Per-base sequence content
    • Adapter content
    • Sequence duplication levels
    • K-mer content
  • Challenges and Considerations
    • Balancing trimming and data retention
    • Handling diverse data types
    • Scalability
    • Interpretation of metrics
    • Integration with downstream analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NGS Quality Metrics and Control Measures

A
  • Importance of Quality Control (QC) in NGS
    • Ensures data accuracy and reliability
    • Identifies potential issues early
    • Reduces costs and time
    • Facilitates reproducibility
  • Key Quality Metrics
    • Phred Quality Score (Q score)
    • GC content
    • Read length distribution
    • Base composition
    • Adapter content
    • Duplicate reads
    • Depth of coverage
    • Mapping quality
    • Error rate
  • Tools for Assessing NGS Data Quality
    • FastQC, MultiQC, SAMtools, Picard, Qualimap
  • Control Measures
    • Sample and library preparation controls (input quality check, control samples, PCR-free methods)
    • Sequencing controls (spike-in controls, platform-specific quality metrics)
    • Post-sequencing quality control (adapter trimming, read filtering)
    • Alignment and post-alignment quality control (recalibration, duplication marking)
    • Downstream analysis controls (variant validation, biological replicates, batch effect monitoring)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Flashcard: First-Generation Sequencing (Sanger Sequencing)

A
  • Sanger Sequencing
    • Developed by Frederick Sanger in 1977
    • Chain-termination method (dideoxy sequencing)
    • Used for determining nucleotide sequences
    • Laid the foundation for modern genomic studies
  • Methodology
    • DNA fragmentation and amplification
    • Chain-termination reaction (dNTPs and ddNTPs)
    • Fragment separation and detection (capillary electrophoresis, fluorescent detection)
  • Workflow
    • DNA extraction
    • PCR amplification
    • Sequencing reaction
    • Capillary electrophoresis
    • Data analysis
  • Advantages
    • High accuracy
    • Long read lengths
    • Established technology
    • Cost-effective for small projects
  • Limitations
    • Low throughput
    • High cost for large projects
    • Lower sensitivity for low-frequency variants
    • Limited coverage
  • Applications
    • Small-scale sequencing projects (single gene sequencing, PCR product sequencing)
    • Sequencing of low-complexity genomes (bacterial, viral genomes)
    • Validation of NGS results (variant validation, gene editing validation)
    • DNA barcoding (species identification)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Flashcard: Sanger Sequencing

A
  • Sanger Sequencing
    • Developed by Frederick Sanger in 1977
    • Chain-termination method
    • Used for determining nucleotide sequences
  • Principles
    • Selective incorporation of ddNTPs
    • DNA replication termination
    • Fragment separation and detection
  • Components
    • Template DNA
    • Primer
    • DNA polymerase
    • dNTPs
    • ddNTPs
  • Process
    • Template and primer binding
    • Chain elongation and termination
    • Fragment separation by capillary electrophoresis
    • Detection of fluorescent signals
    • Sequence determination
  • Applications
    • Single gene or PCR product sequencing
    • Validation of NGS results
    • Mitochondrial and viral genomes
    • DNA barcoding
  • Advantages
    • High accuracy
    • Long read lengths
    • Established and reliable
    • Cost-effective for small projects
  • Limitations
    • Low throughput
    • Time-consuming and labor-intensive
    • High cost for large projects
    • Limited detection of low-frequency variants
  • Advances
    • Cycle sequencing
    • Fluorescent dye terminators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Applications of Sanger Sequencing

A
  • Sanger Sequencing Applications
    Back:
    1. Clinical Diagnostics
      * Mutation detection (cystic fibrosis, Huntington’s disease, BRCA1/2)
      * Confirmatory testing
      * Pharmacogenetics
    1. Molecular Biology Research
      * Gene cloning and verification
      * Targeted gene sequencing
      * Study of small genomes
    1. Microbial Identification and Phylogenetics
      * DNA barcoding
      * Phylogenetic studies
    1. Validation of NGS Results
      * Confirmation of variants
    1. Mitochondrial DNA Sequencing
      * Characterization of mtDNA variations
    1. Forensic Science
      * Personal identification, crime scene analysis, biological relationships
    1. Prenatal and Newborn Screening
      * Screening for genetic conditions (phenylketonuria, sickle cell disease)
  • Limitations
    • Low throughput
    • High cost for large-scale projects
    • Limited detection of low-frequency variants
    • Shorter read depth
    • Not suitable for whole-genome or exome sequencing
    • Time-consuming
    • Sequencing errors in homopolymer regions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Next-Generation Sequencing (NGS)

A
  • NGS
    • High-throughput sequencing technology
    • Revolutionized genomics
    • Simultaneous sequencing of millions of fragments
  • Key Differences from Sanger Sequencing
    • Throughput (higher in NGS)
    • Read length (longer in Sanger)
    • Cost and time efficiency (lower in NGS)
    • Accuracy (similar, but higher coverage in NGS)
    • Applications (wider range in NGS)
    • Rare variant detection (better in NGS)
    • Scalability (better in NGS)
  • Advantages of NGS
    • High throughput
    • Cost-effectiveness
    • Rare variant detection
    • High sensitivity
    • Broad range of applications
    • Data density
    • Customizable
  • Applications
    • Whole-genome sequencing (WGS)
    • Exome sequencing
    • RNA sequencing (RNA-Seq)
    • Targeted gene panels
    • Microbiome analysis
    • Cancer genomics
    • Epigenetic studies
  • Limitations
    • Short read lengths (for some platforms)
    • Complex data analysis
    • High upfront cost
    • Coverage variability
  • Conclusion
    • NGS has revolutionized genomics
    • Offers powerful and cost-effective sequencing
    • Wide range of applications
    • Limitations exist, but remains a valuable tool
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Roche 454 Sequencing

A
  • Roche 454 Sequencing
    • One of the first NGS platforms
    • Pyrosequencing technology
    • Discontinued in 2016
  • Mechanism
    • Library preparation (fragmentation, adaptor ligation)
    • Emulsion PCR (bead attachment, amplification)
    • Sequencing-by-synthesis (pyrosequencing)
    • Data analysis
  • Strengths
    • Long read lengths
    • High throughput
    • Low error rate in homopolymeric regions
    • Real-time detection
  • Weaknesses
    • Homopolymer errors
    • Cost
    • Lower throughput compared to later NGS technologies
    • Platform discontinuation
    • Library preparation complexity
  • Applications
    • Microbial genome sequencing
    • Amplicon sequencing
    • Metagenomics
    • Targeted resequencing
    • Ancient DNA sequencing
    • Transcriptomics
  • Conclusion
    • Pioneering NGS technology
    • Valuable for specific applications
    • Discontinued due to limitations
    • Contributed to advancements in genomics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ion torrent sequnecing

A
  • Ion Torrent Sequencing
    • Semiconductor sequencing technology
    • Developed by Ion Torrent Systems
    • Acquired by Thermo Fisher Scientific
      Back:
  • Mechanism
    • Library preparation (fragmentation, adaptor ligation)
    • Emulsion PCR (bead binding, amplification)
    • Loading on semiconductor chip
    • Sequencing-by-synthesis (pH detection)
    • Data processing and analysis
  • Strengths
    • Cost-effective
    • Fast turnaround
    • Scalable
    • Small, accessible instrumentation
    • No need for fluorescent labels
    • Ideal for targeted sequencing
  • Weaknesses
    • Homopolymer errors
    • Shorter read lengths
    • Lower throughput
    • Higher error rate in long reads
    • Limited use in large genomes
    • Chip quality affects results
  • Applications
    • Clinical diagnostics
    • Targeted sequencing panels
    • Microbial genomics and metagenomics
    • Inherited disease research
    • Forensic genomics
  • Conclusion
    • Innovative and cost-effective NGS technology
    • Suitable for various applications
    • Limitations exist, but remains valuable for specific tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

AB SOLiD Sequencing Technology

A
  • AB SOLiD Sequencing
    • Next-Generation Sequencing (NGS) platform
    • Sequencing by oligonucleotide ligation and detection
    • Developed by Applied Biosystems (Thermo Fisher Scientific)
  • Mechanism
    • Library preparation (fragmentation, adaptor ligation)
    • Emulsion PCR (bead binding, amplification)
    • Sequencing by ligation (probes, fluorescent labels, cleavage)
    • Two-base encoding
    • Data analysis (color-space data)
  • Strengths
    • High accuracy
    • Low error rate
    • Ability to detect variants
    • Flexible read lengths
    • Versatile applications
  • Weaknesses
    • Complex workflow
    • Longer turnaround time
    • Shorter read lengths
    • Color-space complexity
    • Reduced market presence
  • Applications
    • Whole-genome sequencing
    • Exome sequencing
    • RNA-Seq
    • Cancer genomics
    • Microbial and metagenomics
    • Epigenetics
  • Conclusion
    • Highly accurate and reliable NGS platform
    • Suitable for applications requiring precision
    • Limitations in workflow and read length
    • Reduced market presence compared to newer platforms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Third-Generation Sequencing (TGS)

A
  • TGS
    • Revolutionized sequencing technology
    • Longer reads, real-time sequencing, direct modification detection
    • Major platforms: PacBio SMRT, Oxford Nanopore
  • Key Features
    • Single-molecule sequencing
    • Long reads
    • Real-time sequencing
    • Direct detection of modifications
  • PacBio SMRT Sequencing
    • Mechanism: Zero-mode waveguides (ZMWs), fluorescently labeled nucleotides
    • Strengths: ultra-long reads, real-time sequencing, high accuracy, epigenetic detection
    • Weaknesses: higher error rate (initially), cost, lower throughput
  • Oxford Nanopore Technologies (ONT)
    • Mechanism: Nanopore sequencing, electrical current detection
    • Strengths: extremely long reads, portability, real-time sequencing, epigenetic detection
    • Weaknesses: higher error rate (historically), lower accuracy for short reads, complex data analysis
  • Key Differences from Second-Generation Sequencing
    • Read length (longer in TGS)
    • Single-molecule sequencing (TGS)
    • Real-time sequencing (TGS)
    • Direct modification detection (TGS)
    • Lower throughput but higher data volume (TGS)
  • Advantages of TGS
    • De novo genome assembly
    • Structural variation detection
    • Full-length transcript sequencing
    • Epigenetics
    • Field applications
  • Limitations of TGS
    • Higher error rates (historically)
    • Cost
    • Lower throughput
    • Complex bioinformatics
    • Data analysis challenges
  • Conclusion
    • Transformative technology
    • Unique capabilities
    • Valuable for specific applications
    • Challenges to overcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PacBio SMRT Sequencing

A
  • PacBio SMRT Sequencing
    • Third-generation sequencing technology
    • Single-molecule real-time sequencing
    • Developed by Pacific Biosciences
  • Mechanism
    • Zero-mode waveguides (ZMWs)
    • Fluorescently labeled nucleotides
    • Real-time detection
    • Circular consensus sequencing (CCS)
  • Strengths
    • Long read lengths
    • High accuracy (HiFi reads)
    • Epigenetic detection
    • Real-time sequencing
  • Limitations
    • Higher cost
    • Lower throughput
    • Initial error rate (CLR)
  • Applications
    • De novo genome assembly
    • Structural variant detection
    • Transcriptome analysis
    • Epigenetics
    • Microbial genomics
    • Clinical applications
    • Agricultural and plant genomics
    • Antibody and vaccine research
  • Conclusion
    • Powerful tool for genomics
    • Unique advantages in long reads, accuracy, and epigenetic detection
    • Suitable for complex applications
    • Limitations in cost and throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Oxford Nanopore Technology (ONT)

A
  • ONT
    • Third-generation sequencing technology
    • Nanopore-based sequencing
    • Real-time, long-read sequencing
  • Mechanism
    • Nanopores
    • Membrane and applied voltage
    • Single-molecule sequencing
    • Real-time electrical signal detection
    • Read length
  • Workflow
    • Library preparation
    • Loading onto flow cell
    • Sequencing
    • Base calling and data analysis
  • Types of Platforms
    • MinION
    • GridION
    • PromethION
    • Flongle
    • SmidgION (in development)
  • Unique Features
    • Long reads
    • Direct RNA sequencing
    • Real-time data generation
    • Portability
    • Low cost
  • Applications
    • De novo genome assembly
    • Structural variant detection
    • Pathogen detection
    • Microbial genomics
    • RNA sequencing
    • Clinical genetics
    • Epigenetics
    • Agricultural and plant genomics
  • Conclusion
    • Revolutionary technology
    • Wide range of applications
    • Advantages in long reads, real-time sequencing, and portability
    • Challenges in error rates and data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Oxford Nanopore Technology (ONT)

A
  • ONT
    • Third-generation sequencing technology
    • Nanopore-based sequencing
    • Real-time, long-read sequencing
  • Mechanism
    • Nanopores
    • Membrane and applied voltage
    • Single-molecule sequencing
    • Real-time electrical signal detection
    • Read length
  • Workflow
    • Library preparation
    • Loading onto flow cell
    • Sequencing
    • Base calling and data analysis
  • Types of Platforms
    • MinION
    • GridION
    • PromethION
    • Flongle
    • SmidgION (in development)
  • Unique Features
    • Long reads
    • Direct RNA sequencing
    • Real-time data generation
    • Portability
    • Low cost
  • Applications
    • De novo genome assembly
    • Structural variant detection
    • Pathogen detection
    • Microbial genomics
    • RNA sequencing
    • Clinical genetics
    • Epigenetics
    • Agricultural and plant genomics
  • Conclusion
    • Revolutionary technology
    • Wide range of applications
    • Advantages in long reads, real-time sequencing, and portability
    • Challenges in error rates and data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequencing Depth and Read Quality

A
  • Sequencing Depth and Read Quality
    • Crucial concepts in sequencing experiments
    • Determine accuracy and reliability of results
  • Sequencing Depth (Coverage)
    • Number of times a nucleotide is sequenced
    • Calculated as total bases sequenced / length of target region
    • Types: shallow coverage, deep coverage
  • Read Quality
    • Accuracy and integrity of individual reads
    • Measured using Phred quality scores (Q-scores)
  • Importance of Coverage
    • Accurate variant calling
    • Error correction
    • Resolving complex regions
    • Applications in different sequencing
  • Strategies for High-Quality Reads and Sufficient Coverage
    • Optimize library preparation
    • Careful sample handling
    • Platform-specific optimizations
    • Adequate depth of sequencing
    • Control for biases
    • Quality control steps
    • Use of internal controls
    • Post-sequencing analysis
    • Redundancy and replicates
  • Summary
    • High sequencing depth ensures accurate variant calling and complex region assembly.
    • High-quality reads reduce errors and false positives.
    • Tailoring depth and quality to the application is crucial for reliable results.