UNIT 4 Flashcards
1
Q
De Novo Genome Assembly
A
- De Novo Genome Assembly
- Reconstructing a genome from scratch
- Process
- Sequencing
- Preprocessing
- Read overlap or mapping
- Assembly
- Scaffolding
- Gap filling
- Polishing
- Quality assessment
- Challenges
- Repetitive sequences
- Heterozygosity
- Computational cost
- Applications
- Genome sequencing of new species
- Evolutionary studies
- Agricultural and medical research
- Key Concepts
- OLC
- De Bruijn graphs
- Contigs
- Scaffolds
- Gap filling
- Polishing
- Quality assessment
2
Q
Greedy Algorithm in Genome Assembly
A
- Greedy Algorithm
- Simple method for de novo assembly
Back:
- Simple method for de novo assembly
- Steps
- Input: short reads
- Find overlaps
- Select best overlap
- Iterative merging
- Challenges: repetitive regions, global vs. local optimality, computational cost, error handling
- Advantages
- Simplicity
- Speed for small datasets
- Limitations
- Inaccuracy with complex genomes
- Inefficient for large datasets
- Suboptimal assembly
- Alternatives
- De Bruijn Graph-based Assembly
- Overlap-Layout-Consensus (OLC)
- Conclusion
- Historical significance
- Limited use in modern genome assembly
- Understanding for foundational knowledge
3
Q
Overlap-Layout-Consensus (OLC) Genome Assembly
A
- OLC
- Method for genome assembly from long reads
- Steps
- Overlap: identify overlaps between reads
- Layout: arrange reads in a graph
- Consensus: generate error-corrected sequence
- Advantages
- Handles long reads effectively
- Accurate layout
- Contig continuity
- Challenges
- Computational cost
- Repetitive sequences
- Large datasets
- Applications
- De novo genome assembly
- Large, complex genomes
- Metagenomics
- Comparison to De Bruijn Graphs
- OLC: better for long reads, handles repetitive regions
- De Bruijn Graphs: efficient for short reads, struggle with repetitive regions
4
Q
De Novo Genome Assembly
A
- De Novo Genome Assembly
- Reconstructing a genome from scratch
- Types of Assemblers
- Overlap-Layout-Consensus (OLC): PacBio, Oxford Nanopore
- De Bruijn Graph: Illumina
- Greedy: simpler, less efficient
- Key Assemblers
- SPAdes: short-read, versatile
- Canu: long-read, error correction
- Velvet: small genomes
- ABySS: large genomes, scalable
- Flye: long-read, efficient
- Trinity: transcriptome assembly
- MaSuRCA: hybrid (long + short)
- Challenges
- Repetitive sequences
- Sequencing errors
- Computational resources
- Contig gaps
- Conclusion
- Essential for studying new organisms
- Various tools available
- Challenges remain to be addressed
5
Q
Genome Assembly
A
- Genome Assembly
- Reconstructing a genome from sequencing reads
- Types
- De novo: no reference
- Reference-based: uses a reference
- Steps
- Sequencing
- Preprocessing
- Read overlap detection
- Contig assembly
- Scaffolding
- Gap filling
- Annotation
- Algorithmic Approaches
- Overlap-Layout-Consensus (OLC): long reads
- De Bruijn Graphs: short reads
- Greedy: simpler, less efficient
- Challenges
- Repetitive sequences
- Sequencing errors
- Computational resources
- Popular Assemblers
- SPAdes, Canu, Flye, Velvet, ABySS, Trinity, MaSuRCA
- Applications
- Genomics research
- Evolutionary biology
- Metagenomics
- Personalized medicine
- Agriculture
- Conclusion
- Essential for understanding genomes
- Continuous advancements in tools and methods
6
Q
Genome Assembly Quality Assessment
A
- Genome Assembly Quality Assessment
- Evaluating accuracy, completeness, and contiguity
- Quality Metrics
- Contiguity: N50, L50, NG50
- Accuracy: base-level errors, error correction, coverage
- Repetitive regions
- BUSCO score
- Statistical Assessment
- Genome size estimation
- GC content
- Repeat content
- Synteny and structural integrity
- Tools
- QUAST, Pilon, BUSCO, REAPR, ALE, FRCbam, KAT
- Challenges
- Repetitive regions
- Polishing errors
- Incomplete or fragmented assemblies
- Conclusion
- Essential for understanding genome quality
- Multiple metrics and tools available
- Challenges require careful evaluation
7
Q
Evolutionary Assessment of De Novo Genome Assembly
A
- Evolutionary Assessment
- Analyzing genome assemblies for evolutionary insights
- Key Concepts
- Phylogenetic analysis
- Comparative genomics
- Orthologs and paralogs
- SNPs, InDels, SVs
- Selection signatures
- Methods and Tools
- Phylogenetic tools (RAxML, MrBayes, PhyML)
- Comparative genomics tools (MUMmer, MAVID, MAUVE)
- Orthology/paralogy tools (OrthoFinder, OrthoMCL, InParanoid)
- Variant calling tools (GATK, SAMtools)
- Selection analysis tools (PAML, HyPhy)
- Structural variation tools (Delly, LUMPY, Manta)
- Applications
- Genome evolution and speciation
- Domestication and adaptation
- Conservation genomics
- Conclusion
- Essential for understanding evolutionary relationships
- Provides insights into genetic diversity, adaptation, and speciation