Genome sequencing analysis Flashcards

1
Q

Describe

The repeated sequences of the human genome

A

Almost 60% of the genome is repeated
* Transposons
* Repeat satellites that contract and expand
* Large segmental duplications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain

Mapping vs alignment

A

Mapping quickly finds the best place(s) where a read can then be aligned optimally, at the base-level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens after the genome is indexed?

A
  1. Exact matches are first found in the reference genome and used as seeds (regions where read is likely to align)
  2. Co-linear (same order in both read and reference genome) and clustered seeds (for higher confidence) are kept and the alignment is extended.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different types of genomic variants?

A
  • Single-nucleotide polymorphisms
  • Insertion-deletion polymorphisms
  • Structural variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the genomic variations in a typical human genome?

A
  • 4.5M SNVs
  • 650k indels
  • 17k structural variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the point of studying genomic variants?

A
  • Looking for pathogenic variants in undiagnosed rare disease patients
  • Looking for common variants associated with a complex disease
  • Understand human evolution and migration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are small variants identified?

A

Small variants are identified as recurrent differences between reads and the reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What technique is used to improve indel detection?

A

Read realignment: tools like ABRA2 can do that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain

VCF format

A

Text-based format to record to position of variant, its alleles, confidence, misc information and genotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of using longer reads?

A
  • Better structural variant detection
  • Enable de novo gene assembly, to better identify complex variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define

Pangenome

A

Collection of genomes and the genetic variants among them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define

vg

A

Variation Graph Toolkit
Open-source program for graph construction, read mapping and variant calling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do pangenome-based methods fare against traditional approaches?

A

Pangenome-based approaches are better, especially for insertions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give an example of a deletion

A
  • The reads are correctly through the deletion on the pangenome
  • Many reads are aligned to the linear reference with the end unaligned (soft-clipped)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe

Building a human pangenome reference

A
  • Use latest sequencing technologies for 350 diverse individuals
  • Pangenome containing a comprehensive catalog of structural variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the six types of structural variants?

A
  • Tandem duplications
  • Insertions/deletions
  • Short tandem repeat contraction/expansion
  • Inversions
  • Mobile element insertions
  • Translocations