UNIT 2 Flashcards

Question 1

Q

Mapping of Sequence Reads to Reference Genomes

Answer

A

Mapping of Sequence Reads
- Fundamental technique in genomics and bioinformatics
- Aligning reads to a reference genome
- Underpins many downstream analyses
Key Concepts
- Sequence reads
- Reference genome
- Alignment
- Alignment algorithms and tools
The Mapping Process
- Preprocessing of reads
- Indexing the reference genome
- Alignment
- Post-alignment processing
Key Concepts
- Mapping quality (MAPQ)
- Unique vs. multi-mapping reads
- Paired-end vs. single-end reads
- Gapped vs. ungapped alignment
Importance of Read Mapping
- Variant detection
- Gene expression analysis
- Genome assembly and structural variation
- Comparative genomics
- Clinical diagnostics
Challenges in Read Mapping
- Repetitive regions
- Structural variations
- Sequencing errors
- Computational demands
- Genomic diversity
Tools and Algorithms
- BWA, Bowtie, STAR, HISAT2, Minimap2, GMAP/GSNAP
- Hash-based indexing, Burrows-Wheeler Transform, suffix arrays, seed-and-extend
Recent Advances
- Long-read mapping
- Graph-based references
- Machine learning approaches
- Cloud computing and parallelization
- Real-time mapping
Practical Considerations
- Choice of reference genome
- Aligner selection
- Parameter optimization
- Quality control and validation
- Handling multi-mapping reads
Conclusion
- Essential technique in genomics
- Enables accurate interpretation of sequencing data
- Continuous advancements in tools and algorithms

Question 2

Q

Sequence Mapping and Read Mapping

Answer

A

Sequence Mapping
- Computational process of aligning reads to a reference genome
- Essential for understanding sequencing data
Key Concepts
- Sequence reads
- Reference genome
- Alignment
- Hashing-based methods (Bowtie, MAQ)
- BWT-based methods (BWA, Bowtie2)
- Seed-and-extend algorithms
- Splice-aware mapping (STAR, HISAT2)
Challenges
- Sequencing errors
- Repetitive regions
- Insertion/deletion (indel) handling
- Large structural variations
- Computational demands
- Genetic diversity
Importance
- Variant calling
- RNA-Seq analysis
- Functional genomics
- Personalized medicine
- Comparative genomics
Practical Considerations
- Choice of reference genome
- Aligner selection
- Parameter optimization
- Quality control
- Handling multi-mapping reads
Conclusion
- Fundamental technique in genomics
- Essential for interpreting sequencing data
- Continuous advancements in tools and algorithms

Question 3

Q

Read Sequence Alignment and Aligners

Answer

A

Read Sequence Alignment
- Fundamental step in bioinformatics
- Aligning reads to a reference genome
- Essential for downstream analysis
Process
- Preprocessing
- Mapping
- Scoring
- Storing results (SAM/BAM)
SAM and BAM Formats
- SAM: Text-based, human-readable
- BAM: Binary, compressed, efficient
- Mandatory and optional fields in SAM
Popular Alignment Tools
- BWA (Burrows-Wheeler Aligner)
- Bowtie/Bowtie2
- STAR (Spliced Transcripts Alignment to a Reference)
- HISAT2
- Minimap2
Key Concepts
- Mapping quality
- Gapped vs. ungapped alignment
- Paired-end vs. single-end reads
- Structural variations
Challenges
- Sequencing errors
- Repetitive regions
- Insertion/deletion (indel) handling
- Large structural variations
- Computational demands
- Genetic diversity
Applications
- Variant calling
- Gene expression analysis
- Genome assembly
- Comparative genomics
- Clinical diagnostics
Conclusion
- Essential for understanding sequencing data
- Variety of tools and algorithms available
- Continuous advancements in the field

Question 4

Q

Manipulating Alignments in SAM/BAM Files

Answer

A

SAM/BAM Manipulation
- Essential for preparing aligned reads for analysis
Common Operations
- Conversion between SAM and BAM
- Sorting alignments
- Indexing BAM files
- Filtering alignments
- Marking and removing duplicates
- Adding or modifying read groups
- Merging and splitting BAM files
- Realignment around indels and BQSR
Tools
- SAMtools
- Picard Tools
- BEDTools
- Sambamba
- GATK
Best Practices
- Start with high-quality alignments
- Sort and index BAM files
- Handle duplicates
- Maintain read groups
- Filter carefully
- Document steps and parameters
- Optimize computational resources
- Validate manipulated files
Advanced Topics
- Graph-based references
- Handling multi-mapping reads
- Integration with workflow managers
Conclusion
- Critical for genomic data analysis
- Mastery of tools and techniques essential
- Adhere to best practices for data integrity and reproducibility

(4 cards)