Genome assembly Flashcards
Why is genome assembly necessary?
no seq technology can produce long enough reads
it is the rate limiting step in genomics
De Novo vs Re-Sequencing
De novo: determination of a full-genome sequence, NO known reference sequence. Needs a lot of sequence coverage and computing power.
Re-sequencing: a reference genome sequence is known. The assembly process is replaced by mapping the raw sequence reads onto the reference genome. Less sequence coverage and computing power needed.
what is resequencing good and bad at detecting?
Good - SNPs
Bad - limited in the detection of structural rearrangements (insertions, deletions, inversions).
How to calculate coverage
bases needed to assemble a sequence/bases in the sequence
How do assembly algorithms work and what are 2 difficulties?
search for overlaps between sequence reads
- data volume requires high comp power
- repeat regions which are larger than the read
relationship btw seq coverage and probability of detection
sigmoidal (S curve)
what is a contig?
partial assembly of data from overlapping fragments into a contiguous region of sequences. The order of the contigs is NOT known.
why are repetitive regions problematic in assembly
regions of receptive seqs can mean that contorts cannot join up.
describe paired end sequencing
- create library from sample DNA.
- isolate fragments which are about 800bp long.
- sequence 250bp from each end of the fragments using illumina.
- now, the sequence of both ends is known and we know the ends are about 800 bp apart. in-between is unknown.
- can map fragments to genome.
when is paired end seq particularly useful?
sequencing of fragments that contain short repeat regions, because paired end reads have relatively small inserts.
what is the difference btw paired end fragments and mate pair fragments?
mate pair frags have larger inserts (3kb-15kb), paired end has about 800bp.
Mate pair enables coverage of regions with large structural rearrangements.
example of 2 long read seq tehcs
PacBio and ox nanopore
how can long read seq tacos be used
can make v long reads but high error rates
so, make initial assembly with long read, then additional illumina short read seq for error correction.
New genome assemblers directly incorporate PacBio and Illumina.
What is Bionano
optical mapping system
How does optical mapping work?
- non-destructive restriction map (no fragmenting)
- single-stranded nick is inserted into double-stranded DNA at positions of a seven-base recognition site. (uses 7bp cutter enzyme) cute every 4^7 bases.
- nick is ligated by insertion of a fluorescent marker, dispersing visible signals along the genome
How is optical mapping used?
By visualizing large-scale segments of the genome, and comparing with a reference, it is possible to detect translocations, repeats and deletions.
In conjunction with sequencing of fragments it is possible to apply the map to assemble a complete genome sequence.
BionanoIrys system
used to check contig placement.
can detect large structural misassemblies
what is a scaffold?
ordered stretch of contigs
contains sequence gaps NNNNN
What approaches are used to build scaffolds?
using approaches such as optical maps
chromosome contact maps
10X genomics
which machine does 10X genomics?
10X genomics chromium
How does 10x genomics work?
- partitions large DNA molecules (50-100 kb) into small droplets. Each droplet is assigned a unique barcode.
- each molecule sequenced using illumina and each read can be certainly assigned to a large DNA molecule.
- works with tiny amount of DNA 1ng, so good for single cell analysis.
2 approaches to whole genome seq projects
- Hierarchial approach - whole genome fragmented and cloned into BACs. order of fragments established before seq
- whole genome shotgun method - large numbers of smaller fragments. harder assmebly
how much DNA can a BAC carry
100-200kbp
can store foreign DNA 10x larger than normal plasmids
BAC to BAC
DNA is cut into fragments of about 150 kb. Fragments are cloned into BACs.
do enough to get good coverage
- BAC clones are ordered by fingerprinting, e.g. based on overlapping restriction fragment size patterns.
- individual BAC clones are sequenced, sequences of each BAC clone assembled independently.
- since order is known, can infer order of sequence.