UNIT 4 Flashcards

1
Q

De Novo Genome Assembly

A
  • De Novo Genome Assembly
    • Reconstructing a genome from scratch
  • Process
    • Sequencing
    • Preprocessing
    • Read overlap or mapping
    • Assembly
    • Scaffolding
    • Gap filling
    • Polishing
    • Quality assessment
  • Challenges
    • Repetitive sequences
    • Heterozygosity
    • Computational cost
  • Applications
    • Genome sequencing of new species
    • Evolutionary studies
    • Agricultural and medical research
  • Key Concepts
    • OLC
    • De Bruijn graphs
    • Contigs
    • Scaffolds
    • Gap filling
    • Polishing
    • Quality assessment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Greedy Algorithm in Genome Assembly

A
  • Greedy Algorithm
    • Simple method for de novo assembly
      Back:
  • Steps
    • Input: short reads
    • Find overlaps
    • Select best overlap
    • Iterative merging
    • Challenges: repetitive regions, global vs. local optimality, computational cost, error handling
  • Advantages
    • Simplicity
    • Speed for small datasets
  • Limitations
    • Inaccuracy with complex genomes
    • Inefficient for large datasets
    • Suboptimal assembly
  • Alternatives
    • De Bruijn Graph-based Assembly
    • Overlap-Layout-Consensus (OLC)
  • Conclusion
    • Historical significance
    • Limited use in modern genome assembly
    • Understanding for foundational knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Overlap-Layout-Consensus (OLC) Genome Assembly

A
  • OLC
    • Method for genome assembly from long reads
  • Steps
    • Overlap: identify overlaps between reads
    • Layout: arrange reads in a graph
    • Consensus: generate error-corrected sequence
  • Advantages
    • Handles long reads effectively
    • Accurate layout
    • Contig continuity
  • Challenges
    • Computational cost
    • Repetitive sequences
    • Large datasets
  • Applications
    • De novo genome assembly
    • Large, complex genomes
    • Metagenomics
  • Comparison to De Bruijn Graphs
    • OLC: better for long reads, handles repetitive regions
    • De Bruijn Graphs: efficient for short reads, struggle with repetitive regions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

De Novo Genome Assembly

A
  • De Novo Genome Assembly
    • Reconstructing a genome from scratch
  • Types of Assemblers
    • Overlap-Layout-Consensus (OLC): PacBio, Oxford Nanopore
    • De Bruijn Graph: Illumina
    • Greedy: simpler, less efficient
  • Key Assemblers
    • SPAdes: short-read, versatile
    • Canu: long-read, error correction
    • Velvet: small genomes
    • ABySS: large genomes, scalable
    • Flye: long-read, efficient
    • Trinity: transcriptome assembly
    • MaSuRCA: hybrid (long + short)
  • Challenges
    • Repetitive sequences
    • Sequencing errors
    • Computational resources
    • Contig gaps
  • Conclusion
    • Essential for studying new organisms
    • Various tools available
    • Challenges remain to be addressed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Genome Assembly

A
  • Genome Assembly
    • Reconstructing a genome from sequencing reads
  • Types
    • De novo: no reference
    • Reference-based: uses a reference
  • Steps
    • Sequencing
    • Preprocessing
    • Read overlap detection
    • Contig assembly
    • Scaffolding
    • Gap filling
    • Annotation
  • Algorithmic Approaches
    • Overlap-Layout-Consensus (OLC): long reads
    • De Bruijn Graphs: short reads
    • Greedy: simpler, less efficient
  • Challenges
    • Repetitive sequences
    • Sequencing errors
    • Computational resources
  • Popular Assemblers
    • SPAdes, Canu, Flye, Velvet, ABySS, Trinity, MaSuRCA
  • Applications
    • Genomics research
    • Evolutionary biology
    • Metagenomics
    • Personalized medicine
    • Agriculture
  • Conclusion
    • Essential for understanding genomes
    • Continuous advancements in tools and methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Genome Assembly Quality Assessment

A
  • Genome Assembly Quality Assessment
    • Evaluating accuracy, completeness, and contiguity
  • Quality Metrics
    • Contiguity: N50, L50, NG50
    • Accuracy: base-level errors, error correction, coverage
    • Repetitive regions
    • BUSCO score
  • Statistical Assessment
    • Genome size estimation
    • GC content
    • Repeat content
    • Synteny and structural integrity
  • Tools
    • QUAST, Pilon, BUSCO, REAPR, ALE, FRCbam, KAT
  • Challenges
    • Repetitive regions
    • Polishing errors
    • Incomplete or fragmented assemblies
  • Conclusion
    • Essential for understanding genome quality
    • Multiple metrics and tools available
    • Challenges require careful evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Evolutionary Assessment of De Novo Genome Assembly

A
  • Evolutionary Assessment
    • Analyzing genome assemblies for evolutionary insights
  • Key Concepts
    • Phylogenetic analysis
    • Comparative genomics
    • Orthologs and paralogs
    • SNPs, InDels, SVs
    • Selection signatures
  • Methods and Tools
    • Phylogenetic tools (RAxML, MrBayes, PhyML)
    • Comparative genomics tools (MUMmer, MAVID, MAUVE)
    • Orthology/paralogy tools (OrthoFinder, OrthoMCL, InParanoid)
    • Variant calling tools (GATK, SAMtools)
    • Selection analysis tools (PAML, HyPhy)
    • Structural variation tools (Delly, LUMPY, Manta)
  • Applications
    • Genome evolution and speciation
    • Domestication and adaptation
    • Conservation genomics
  • Conclusion
    • Essential for understanding evolutionary relationships
    • Provides insights into genetic diversity, adaptation, and speciation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly