Genome assembly Flashcards
Why is genome assembly necessary?
no seq technology can produce long enough reads
it is the rate limiting step in genomics
De Novo vs Re-Sequencing
De novo: determination of a full-genome sequence, NO known reference sequence. Needs a lot of sequence coverage and computing power.
Re-sequencing: a reference genome sequence is known. The assembly process is replaced by mapping the raw sequence reads onto the reference genome. Less sequence coverage and computing power needed.
what is resequencing good and bad at detecting?
Good - SNPs
Bad - limited in the detection of structural rearrangements (insertions, deletions, inversions).
How to calculate coverage
bases needed to assemble a sequence/bases in the sequence
How do assembly algorithms work and what are 2 difficulties?
search for overlaps between sequence reads
- data volume requires high comp power
- repeat regions which are larger than the read
relationship btw seq coverage and probability of detection
sigmoidal (S curve)
what is a contig?
partial assembly of data from overlapping fragments into a contiguous region of sequences. The order of the contigs is NOT known.
why are repetitive regions problematic in assembly
regions of receptive seqs can mean that contorts cannot join up.
describe paired end sequencing
- create library from sample DNA.
- isolate fragments which are about 800bp long.
- sequence 250bp from each end of the fragments using illumina.
- now, the sequence of both ends is known and we know the ends are about 800 bp apart. in-between is unknown.
- can map fragments to genome.
when is paired end seq particularly useful?
sequencing of fragments that contain short repeat regions, because paired end reads have relatively small inserts.
what is the difference btw paired end fragments and mate pair fragments?
mate pair frags have larger inserts (3kb-15kb), paired end has about 800bp.
Mate pair enables coverage of regions with large structural rearrangements.
example of 2 long read seq tehcs
PacBio and ox nanopore
how can long read seq tacos be used
can make v long reads but high error rates
so, make initial assembly with long read, then additional illumina short read seq for error correction.
New genome assemblers directly incorporate PacBio and Illumina.
What is Bionano
optical mapping system
How does optical mapping work?
- non-destructive restriction map (no fragmenting)
- single-stranded nick is inserted into double-stranded DNA at positions of a seven-base recognition site. (uses 7bp cutter enzyme) cute every 4^7 bases.
- nick is ligated by insertion of a fluorescent marker, dispersing visible signals along the genome