Genome sequencing analysis Flashcards
Describe
The repeated sequences of the human genome
Almost 60% of the genome is repeated
* Transposons
* Repeat satellites that contract and expand
* Large segmental duplications
Explain
Mapping vs alignment
Mapping quickly finds the best place(s) where a read can then be aligned optimally, at the base-level.
What happens after the genome is indexed?
- Exact matches are first found in the reference genome and used as seeds (regions where read is likely to align)
- Co-linear (same order in both read and reference genome) and clustered seeds (for higher confidence) are kept and the alignment is extended.
What are the different types of genomic variants?
- Single-nucleotide polymorphisms
- Insertion-deletion polymorphisms
- Structural variants
What are the genomic variations in a typical human genome?
- 4.5M SNVs
- 650k indels
- 17k structural variants
What is the point of studying genomic variants?
- Looking for pathogenic variants in undiagnosed rare disease patients
- Looking for common variants associated with a complex disease
- Understand human evolution and migration
How are small variants identified?
Small variants are identified as recurrent differences between reads and the reference genome
What technique is used to improve indel detection?
Read realignment: tools like ABRA2 can do that
Explain
VCF format
Text-based format to record to position of variant, its alleles, confidence, misc information and genotype
What are the advantages of using longer reads?
- Better structural variant detection
- Enable de novo gene assembly, to better identify complex variants
Define
Pangenome
Collection of genomes and the genetic variants among them
Define
vg
Variation Graph Toolkit
Open-source program for graph construction, read mapping and variant calling
How do pangenome-based methods fare against traditional approaches?
Pangenome-based approaches are better, especially for insertions
Give an example of a deletion
- The reads are correctly through the deletion on the pangenome
- Many reads are aligned to the linear reference with the end unaligned (soft-clipped)
Describe
Building a human pangenome reference
- Use latest sequencing technologies for 350 diverse individuals
- Pangenome containing a comprehensive catalog of structural variants