Human Genome Sequencing Flashcards

Question 1

Q

Shotgun Sequencing

Answer

A

Random fragmentation of DNA that are sequenced individually, overlapping fragments are short and reassembled to reconstitute genome
depends upon coverage
first was Haemophilus influenzae

Celera Genomics 1998 (3 years)
Drosophila genome (~180Mb)
- 120Mb of eurochromatic genome
- a lot errors supported by BAC map

Human genome (fev 2001) - 2,696Mb
- used IHGSC data
- Chr16 smaller
- under represent repeats
- not good for long alinments (1-4kb)
- effectiveness: number, type, size
- difficult assembling highly similar repeats (recent origin)
- oversimplifies duplicated regions
- 300million

Question 2

Q

Hierarchical Shotgun (BAC library)

Answer

A

Decompose genome into overlapping BAC clones that are shotgun sequenced and reassemble each one and merge with sequences of adjacent clones - clone contig map
IHGSC (1990)
Human BAC libraries
- 20x coverage
- 2865Mb
- RPC1-11 male library (543797 clones, 32,2x)
3billion
BAC fingerprinting - digest with 1 or 2 RE - separate electrophoresis - identify BACs with common bands (complete and partial)
clone end sequencing (map as you go): sequence ends of BAC clones and BAC entirely, query database of ends using BAC sequence as seed, identify overlapping BACs and sequence
build 34

Question 3

Q

algorithm approach, problem and solution

Answer

A

overlap: compare all sequence reads pairwise and find overlaps (graph)
layout: determine the shortest path through graph
consensus: where overlaps differ in sequence use consensus

Problem - repeats cause many identical overlaps (miss assembly, gaps, missing data)
-tandem repeat: only one copy of the repeat
- 2 genome wide repeats w/ sequence in the middle: lost sequence between them

Solution: sequence ends of large inserts (2,10,50kb) and use it to make sure nothing is being lost (distance, correspondent ends)

Question 4

Q

2001 assemblies and 3 problems

Answer

A

both missing 10% euchromatic map and 30% overall genome
many gaps specially celera
missambled regions
pseudogene were actually sequencing errors

1- HEXA (chr 15): miss assembly - exons 6,7,8 also present at chr3
2- ILI2RB2: inversion 9-15 exons and duplication of 15
3- ITBG3: pseudogene - framshift, missing terminal exon 15

Question 5

Q

IHGSH revision build 35

Answer

A

oct 2004
2,85billion nt
gaps: 341
~99% euchromatic
error rate: 1 event per 10000 bases
protein encoding genes: 20000 (LOW)

Question 6

Q

nowadays

Answer

A

problem: gaps in coverage, low fold coverage, unstable/uncleared sequences
solution: high fold coverage region, specialise cloning strategies
Human Genome (mar 2019)
-3272116950bp
-gaps: 349
-accuracy: <1error/10000bp
- anomalies: PAR of y - copies X; centromeres modelled not gaps

Apr 2022 - T2T most complete with gap free derived from fertilised egg, good representation of repetitive sequences

1000 genomes (2008-2015): 1000 individuals around the world, 454/illumina

10000 genome - UK (2019) - 62 000 genomic analysis, NHS patients and families, 1 in 5 rare disease have a diagnosis, ~50% cancer cases with add data