Sequencing techniques Flashcards
What is the principle of Sanger sequencing?
Sanger Sequencing is a DNA sequencing method that relies on dideoxynucleotide (ddNTP) chain termination to generate DNA fragments of varying lengths, which are then analyzed to determine the sequence.
What are the steps of Sanger sequencing?
- PCR amplification of the DNA fragment.
Addition of dNTPs + fluorescently labeled ddNTPs. - Chain termination: Each strand stops at different lengths due to ddNTP incorporation.
- Capillary electrophoresis to separate fragments by size.
- Laser detection of fluorescent ddNTPs to read sequence.
Expected Result: A chromatogram where each peak corresponds to a nucleotide in the sequence.
What are the advantages and limitations of Sanger sequencing?
Pros: High accuracy, ideal for short DNA fragments
Cons: Low throughput, expensive for large genomes
Best Use Case: Validating mutations found in NGS experiments or sequencing individual genes.
How does NGS differ from Sanger sequencing?
- Massively parallel sequencing: Reads millions of fragments simultaneously.
- Uses short reads (~100-300 bp) but generates huge amounts of data.
- Unlike Sanger, NGS is high-throughput and cost-effective for whole-genome sequencing (WGS) and RNA-Seq.
What are the key steps in an NGS experiment?
1) Library Preparation:
- DNA is fragmented into short pieces (~200–500 bp).
- Adapters (short, known sequences) are ligated to both ends of each fragment.
- These adapters allow binding to the flow cell and provide priming sites for sequencing.
2) Amplification (Cluster Generation):
- DNA fragments are loaded onto a flow cell coated with complementary adapters.
- Bridge PCR (clonal amplification) creates clusters of identical DNA fragments, each originating from a single template molecule.
3) Sequencing by Synthesis (SBS):
- DNA polymerase incorporates fluorescently labeled nucleotides (A, T, C, G), one base at a time.
- Each base emits a distinct color, which is recorded after each cycle.
- The process repeats base by base, generating short reads (~50–300 bp each, depending on the platform).
4) Data Analysis:
- Raw reads (FASTQ format) are quality-checked.
- Reads are aligned to a reference genome (if available) or assembled de novo if no reference exists.
- Variants (SNPs, indels) are identified using bioinformatics tools.
What are the major applications of NGS?
- Whole-genome sequencing (WGS): Identify mutations, structural variants.
- Whole-exome sequencing (WES): Sequence only protein-coding regions (~1.5% of genome).
- RNA-Seq: Analyze gene expression profiles.
Best Use Case: Studying complex diseases, detecting rare mutations, transcriptome analysis.
How does Oxford Nanopore sequencing work?
- Uses nanopores (tiny protein channels) in a membrane.
- DNA is passed through the pore, and an electric current change is detected as nucleotides move through.
- The current signal is translated into a sequence in real time.
What are the advantages and disadvantages of Oxford Nanopore sequencing?
Pros: Real-time sequencing, portable (MinION device), long reads (up to 1Mb)
Cons: Higher error rate compared to Illumina NGS
Best Use Case: Structural variants, metagenomics, long-read sequencing applications.
What is RNA-Seq and how does it work?
Extracts mRNA, converts it into cDNA, and sequences it using NGS.
Unlike microarrays, RNA-Seq can detect novel transcripts and alternative splicing.
What are the key steps in RNA-Seq?
- RNA extraction and poly-A selection (for mRNA-Seq) or rRNA depletion (for total RNA-Seq).
- Reverse transcription into cDNA using reverse transcriptase.
- Fragmentation and adapter ligation.
- NGS sequencing.
- Data analysis: Read mapping, quantification, differential expression analysis.
Expected Result: A list of genes with expression levels under different conditions.
What is the difference between poly-A selection and rRNA depletion?
- Poly-A selection
Captures only mRNAs with a poly-A tail
Standard for gene expression analysis - rRNA depletion
Removes ribosomal RNA, keeping other RNAs (lncRNA, miRNA)
Ideal for total RNA sequencing
Best Use Case: Poly-A selection for coding genes, rRNA depletion for studying all RNA types.
What is the difference between paired-end and single-read sequencing?
- Single-read
Sequences DNA from one end
Cheaper, faster
Less accurate, hard to assemble repeats - Paired-end
Sequences DNA from both ends
Higher accuracy, better alignment
More expensive
Paired-end is better for genome assembly and transcriptomics.
You are studying mutations in a rare genetic disease. Which sequencing approach should you use?
- Whole-exome sequencing (WES): Best for finding protein-coding mutations.
- If non-coding regions are important, use whole-genome sequencing (WGS).
What is Whole-Exome Sequencing (WES)?
Whole-Exome Sequencing (WES) is a targeted sequencing approach that captures and sequences only the protein-coding regions (exons) of the genome, which make up ~1–2% of the human genome but contain ~85% of disease-causing mutations.
Key Feature:
- Uses hybridization to capture probes to enrich exonic regions before sequencing.
- Faster and cheaper than Whole-Genome Sequencing (WGS) while still detecting most disease-causing mutations.
How does the WES workflow work?
1️⃣ DNA Extraction – Isolate DNA from blood, saliva, or tissue samples.
2️⃣ Fragmentation – DNA is sheared into small pieces (~150–200 bp).
3️⃣ Library Preparation – Adapters are ligated to DNA fragments.
4️⃣ Exome Capture (Hybridization-based Enrichment)
Uses biotin-labeled probes complementary to exon sequences.
Target fragments are pulled down with streptavidin-coated beads.
5️⃣ NGS Sequencing (Illumina, etc.) – Only the captured exonic sequences are sequenced.
6️⃣ Bioinformatics Analysis – Align reads to a reference genome, identify SNPs, indels, and mutations.
Expected Results:
A VCF file (Variant Call Format) containing detected mutations within exons.
What are the advantages and disadvantages of WES?
- WES (Whole-Exome Sequencing)
Exons (~1.5% of genome)
Cheaper
Less data, easier analysis
Mendelian diseases, coding mutations
Misses Non-Coding Regions - WGS (Whole-Genome Sequencing)
Entire genome
More expensive
Large data, complex analysis
Regulatory & structural variants
When should you use WES instead of WGS?
- When the disease is suspected to be caused by protein-coding mutations.
- When cost or data storage is a limitation.
- For inherited disorders where the causal mutations are likely exonic.
If you are diagnosing a rare Mendelian disease (e.g., cystic fibrosis, Marfan syndrome), WES is preferred because most known pathogenic mutations occur in exons.
What types of genetic variations can WES detect?
✅ Single nucleotide variants (SNVs) – e.g., p53 R175H mutation in cancer.
✅ Small insertions/deletions (indels) – e.g., BRCA1 185delAG in breast cancer.
✅ Splice site mutations – e.g., CFTR gene mutation in cystic fibrosis.
❌ What WES CANNOT Detect Well:
Large structural variants (e.g., translocations, inversions).
Variants in introns or intergenic regions (non-coding regions).
Epigenetic modifications (e.g., DNA methylation).
A patient has a suspected inherited disorder, but WES comes back negative. What are possible explanations?
1️⃣ The causal mutation is in a non-coding region (e.g., enhancer, promoter).
2️⃣ The patient has structural variations (deletions, duplications, translocations) not detected by WES.
3️⃣ The disease is not genetic or has an epigenetic component.
4️⃣ Low read depth in the sequencing run might have caused missed variants.
📝 Next Steps: Perform Whole-Genome Sequencing (WGS) or use long-read sequencing (PacBio, Oxford Nanopore).
How do you analyze and interpret WES data?
1️⃣ Read Alignment – Raw sequencing reads are mapped to a reference genome.
2️⃣ Variant Calling – Detect SNPs, insertions, deletions using bioinformatics tools like GATK.
3️⃣ Variant Annotation – Predict functional impact (e.g., using SIFT, PolyPhen).
4️⃣ Filtering & Prioritization – Filter clinically relevant variants using databases like ClinVar, gnomAD.
What are the key steps in RNA-Seq data analysis?
1️⃣ Quality Control (QC):
Check raw sequencing reads using FastQC to identify sequencing errors, adapter contamination, and low-quality reads.
Trim poor-quality reads.
2️⃣ Alignment to a Reference Genome:
Map reads to a reference genome using aligners.
If working with novel transcripts, use de novo assembly instead of reference mapping.
3️⃣ Quantification of Gene Expression:
Count how many reads align to each gene.
Normalize read counts to account for sequencing depth differences using TPM (Transcripts Per Million) or FPKM/RPKM (Fragments/Reads Per Kilobase per Million mapped reads).
4️⃣ Differential Gene Expression Analysis:
Compare gene expression levels between conditions (e.g., cancer vs. normal cells).
Use statistical tools to identify significantly differentially expressed genes.
5️⃣ Functional Enrichment Analysis:
Identify pathways affected by differentially expressed genes using tools like KEGG pathway enrichment.
What results do you get from RNA-Seq analysis?
1) Raw FASTQ files containing sequencing reads.
2) Expression matrices (counts per gene/transcript) showing how many reads mapped to each gene.
3) Differentially expressed genes (DEGs) between conditions (e.g., genes upregulated in a disease state).
4) Volcano plots & heatmaps showing gene expression differences.
5) Pathway analysis results identifying biological processes affected by differentially expressed genes.