Week 2.4 Genomic Technologies Flashcards
How does ‘Illumina’ sequencing work?
- Randomly fragment genomic DNA and ligate known adaptors to both ends of the fragments
- Bind single-stranded fragments randomly to the inside suface of the flow cell channels
- Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification
- The enzyme incorporates nucleotides to build double-stranded bridge on the solid-phase substrate
- The first sequencing cycle begins by adding four labeled reversible terminators, primers and DNA polymerase
- After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified
- The sequencing cycles are repeated to determine the sequence of bases in a fragment one base at a time
- The data are aligned and compared to reference, and sequencing differences are identified
Illumina:
Paired-ends and Mate pairs
How long are paired ends?
How long are mate pairs?
Paired-ends and Mate pairs Illumina:
Pair-ends: normally have short sequence separating the two reads
Mate-pairs: have longer gaps
Paired ends: short gaps < 1kbp
Mate pairs: long gaps > 1kbp
Most sequencing is done using illumina sequencing
3rd Generation New coming in;
How do the following sequence:
PacBio
Ion Torrent
Oxford nanopore
- *PacBio (Pacific Biosciences)** sequences single molecule, and you watch which individual nucleotide. Avoids need for amplifying identical fragments of DNA, can achieve long read lengths.
- *Ion Torrent** uses pH changes to detect nucleotide incorporations
- *Oxford nanopore** single molecule run through nanopore
PacBio (Pacific Biosciences)
SMRT –> Single molecule real time
How does it work?
How long can the read be?
What is the problem with it? how could this be overcome?
PacBio (Pacific Biosciences) SMRT Single molecule real time DNA polymerase duplicates DNA, SMRT uses the DNA poly by using phospholinked nucleotides, a florescent is added to the terminal phosphate. The enzyme cleaves away leaving normal DNA strand enabling exploiting DNA poly. Every time base is added, you get a flash from a single nucleotide, thus a very special well to carry out these reactions, with very powerful microscope.
They can run very long read, up to 20,000. Its brilliant in terms of length of read. Problem; is it is only 70% accurate and so you get many mistakes, but the mistakes appear to be random thus as long as you repeat the sequencing many times you remove the mistakes. Could either do high coverage (expensive) or you can use illumina as well as PacBio to identify mistakes
IonTorrent
Has not proved to be very good – skipped
Oxford Nanopore
How does it work?
What are they trying to do now?
Everything revolves around a Nanopore – a specially design protein that allows only a single strand of DNA to pass through it at any one time.
They put it into a human made membrane that is resistance to electricity they put a charge to that membrane so that hydrogen ions move through the pore but if there is other things there they impede the flow and you can measure that with sensitive detectors.
They run a strand of DNA through, according to the 5 bases, the flow of ions is impeded in a characteristic way, and so, they can read off what is going through the pore. A lot of effort put into slowing down the pores.
Nanopore DNA sequencing
https://nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing
MinION
https://nanoporetech.com/news/movies#movie-28-minion as big as a DNA
What are the costs involved?
What decisions need to be made?
Decsions based on
Read length
Accurarcy
Cost per Mb
Minimums sped
:
Sanger – minimum spend $6 (£4) still a place for Sanger sequencing.
Latest 454 has a similar length but only costs $10, minimum spend is $3000 because you have to FILL up the wells Illumina is the cheapest because it is only 50-100bp
Bionano Irys
Why use this?
How does it work?
https://www.youtube.com/watch?v=XSH5ushqARo Because the details look similar, you need the bigger picture to understand Extract long DNA, they have enzymes that nick the DNA at the target sites and another enzymes comes along and labels particular base pairs, run through special plate so that the strand of DNA is held into a long strand and then make a massively parallel, barcode almost, by looking at the gaps we can pick up similar gaps and build a map, and identify variation with reference to the map. Within Contigs assembled from Illumina we can find Contigs that have the same spacing in those sites and they can be lined up.
Re-sequencing
Why do this?
Just one human genome doesn’t tell us very much
We need to look at more than one individual but we could do a low coverage human genome reference.
We can make an assembly based on the human reference genome – this is called; re-sequencing
As we have a good quality human reference genome, we do not need to make de novo assemblies of human WGS data for new individuals
We can assemble reads by alignment to the human reference
This is called re-sequencing less coverage of the genome is required
Less coverage of the genome is required
However, we may miss structural variation
What is this mainly important in?
This rearrangement is mainly important in cancer, we can do low coverage with illumina sequencer and map this against human genome. To look for structural variation programmes can be used to look for broken reads.
Target enrichment
What does it involve?
Allows us to sequence specific parts of the genomes in multiple individuals Generally involve hybridising whole genomic DNA to oligonucleotide
as a PCR primer
as a bait attached to a plate (microarray)
as a bait attached to a bead
What is a SNP?
SNP genotyping Single nucleotide polymorphism – the easiest type to look at, just single bases, commonly varying base regions can be inferred from the 1000’s of genome sequences carried out already