Human genome sequencing. Flashcards
What is the definition of whole genome sequencing?
The complete genome sequence at the same time, including nuclear, mitochondrial and chloroplast DNA (where applicable.)
Whole genome sequencing is different from DNA profiling, what is DNA profiling?
DNA profiling determines the likelihood that genetic information comes from an individual or a group.
How long did it take to sequence the first genome?
13 years.
How much did it cost to sequence the first genome?
$ 3.8 billion.
The first genome took 13 years to sequence, it is now possible to sequence hundreds in a matter of weeks. How much would this now cost?
$300,000.
What would the perfect sequencer result in?
Instantaneous and unprocessed samples bing produced.
What are 6 challenges that genome sequencing still faces (both NGS and first generation) ?
- Nucleic acid extraction.
- Sub fractional size selection.
- Separation of molecules into individual positions.
- Amplification of the signal.
- Reading the signal.
- Data analysis.
Genome sequencing still faces multiple problems, what one of these problems has almost been resolved?
Extraction of the nucleic acid.
What was the first method used to sequence the genome/
Sanger sequencing.
Sanger sequencing used to be able to sequence 300bp at a time, how many can it sequence now?
1000bp. This however still isn’t a whole chromosome.
Most aspects of genome sequencing are considerably cheaper now, what aspect is still expensive?
Sanger sequencing.
In what year was the human genome project started?
1990.
What did the first phase of the human genome project involve?
The creation of genetic and physical maps of human and mice.
What two organisms were sequenced at the start of the human genome project and what sizes were their genomes?
The worm (100mb) and yeast (12mb).
When was the first draft human genome sequence created?
1997-2000.
The first draft of the human genome was mostly correct, however what did it contain?
Many gaps and errors.
When a genome has a certain amount of errors it is classed as complete. True or false?
False, there is no distinct limit. Most genomes are not as complete as the human genome, however there are some which are more complete.
How many countries and US labs were involved in the human genome project?
18 countries and US 200 labs.
Where was most of the human genome sequenced?
In the welcome trust in the UK.
What is a genetic map?
The order of genetic mapping markers and the genetic difference between them based on recombination frequency. Distance measured in centimorgans.
What do genetic maps rely on?
Sequence variation between parents and individuals of 300bp.
What are genetic maps mostly based on?
PCR to determine restriction fragment length polymorphisms, mini satellites and micro satellites.
What is a physical map?
The actual location of DNA sequences in a genome.
Is a physical or genetic map more useful?
Physical.
What are the four steps of genome sequencing through the clone-by-clone method?
- Extract DNA.
- Fragmentation of DNA.
- Size selection of DNA.
- Cloning of 100-200 Kbp fragments into BACS, YACS or PACs to create a genome library.
What are the three main methods of fragmenting DNA?
Physical, enzymatic and chemical.
What are two examples of physical methods to fragment DNA?
Sanitation and nebulisation (hydrodynamic sheering).
What do enzymatic methods of DNA fragmentation involve?
Restriction enzymes and transposases.
What are two examples of chemical methods used for DNA fragmentation?
Heat and divalent cations such as Zn2+ and Mg2+.
What methods of fragmentation are most often used with mRNA?
Chemical.
What do YACS include?
Yeast centromere, telomere and linear insert.
What was the first method used for the creation of a gene library in human genome sequencing and what was the disadvantage of this method?
YACS which allowed for recombination with other parts of the yeast genome.
What method of genome fragmentation create preferred random fragments?
Physical.
What does fragmentation with restriction enzymes result in?
An incomplete digest with unevenly distributed fragments.
Although restriction enzymes result in unevenly distributed fragments what beneficial feature do they also result in?
Fragments with undamaged sticky ends.
Inserts can be up to 2000bp. How many base pairs can Sanger sequencing accurately sequence?
500-1000bp.
What does a clone contain stand for?
A continuous set of clones.
What can a super contig also be called?
A scaffold.
Once you have seen if your fragments contain the markers what can you do?
Design primers to sequence the DNA.
If A BAC contains the end sequence of another BAC what can you assume?
That they are found next to each other.
Can super contains include gaps?
Yes.
What are the main steps of shotgun sequencing BAC clones?
- Retrive 40-1000 kb of DNA from BAC clone.
- Break up DNA into 5-10kb fragment.
- Use universal primers to sequence insert.
Why did the human genome project not ‘walk along’ the DNA with primers when shotgun sequencing the bas clones?
As it is a very expensive and time consuming method. Can only do 1000bp a day.
What do you need to sequence to assemble large fragments in shotgun sequencing of BACS?
Lots of paired sequences. You can not sequence the middle section however.
When the human genome project was originally completed what coverage of the human genome was originally desired?
5 times coverage.
When the human genome was first sequenced 5 times coverage was aimed for. What coverage is now aimed for?
30 times coverage.
What does BAC need sequencing by sanger sequencing produce?
Super contigs and scaffolds.
What order is correct?
BACS, sanger, contigs, shotgun.
BACS, shotgun, contigs, sanger.
BACS, sanger, contigs, shotgun.
What profitable organisation wanted to patent the human genome?
Celera genomics.
Who was responsible for wanting to patent the human genome?
Craig Venter.
What percentage of the human genome was completed before Celera genomics decided to also try and complete the human genome sequence?
5%-10%.
What was Celera Genomics approach to completing the human genome?
Size select the DNA and then clone 10, 20 and 50 Kbp fragments to create a library. These were assembled into a consensus sequence and into contigs which could be sequences automatically by AB13700.
What fold coverage did Celera Genomics aim for?
5 Fold.
Why were gas found in the human genome when it was originally sequenced?
Cloning bias from restriction sites not being evenly distributed. Fragments were also not evenly inserted.
What techniques were used to minimise gaps in the human genome?
The use of different restriction enzymes and different physical and chemical fragmentation methods.
Why in some rare occasions were DNA inserts unstable in E.coli?
The insert could contain a lethal gene to E.coli.
For an unknown reason two types of vectors worked better in sequencing the human genome. What were these two?
BACS and PACS. YACS did not work as well.
What were the two types of gaps that were present in the draft human genome sequence copy and what were the difference between these?
Sequencing gaps have the sequence present. Physical gaps the sequence is absent from the gene libraries.
How would you close a sequencing gap?
Design a new sequencing primer based on end sequences and Sanger sequence from both ends. Fragment can be larger than 1kb but the whole process is very slow.
What do you know regarding physical gaps?
The order of the scaffolds but not the sequence in-between.
How would you close a physical gap?
- Amplify the DNA between the gap by PCR.
- This DNA can be further amplified though clones as PCR can only amplify 3kb.
- Sequence the PCR product or clone into a plasmid vector and end sequence these. Will be able to determine what fragments are by the end sequences.
If you do not know the order of the scaffolds how can you determine what primers to use to fill any gaps?
Try EVERY possibility of primers and look for a PCR reaction product using genomic DNA as the template. An algorithm can then be used to determine the minimum combination in which primers are adjacent.
How much of the human genome is made of repetitive DNA sequences?
45%-50% of the human genome.
What are 6 examples of repetitive DNA sequences found in the human genome?
- Minisatellites.
- Microsatellites.
- Centromeres.
- Telomeres.
- Transposons.
- Duplicated genes.
What type of relative DNA sequence can be a problem in genome sequencing?
Duplicated genes.
What part of the genome is hard to assemble meaning it is the last part to be resolved?
The repetitive parts.
What do many poor quality genomes never get resolved and why?
Repetitive parts as they are expensive to resolve and rarely contain genes.
What two things can repetitive sequences cause?
Truncations and rearrangements.
What types of repeats often cause truncations?
Tandem repeats.
What free factors influenced the species chosen to have their genomes sequenced?
- If they were genetic models.
- Their commercial and medical relevance.
- Their genome density and genome size.
What is the scientific name for a mouse?
M. musculus.
What four organisms were sequenced due to their commercial and medical relevance?
- H. sapiens.
- Oryza Sativa (rice).
- P. falciparum.
- Haemophilus influenza (representative bacterial disease.)
Arabidopsis thaliana has the smallest plant genome at 100mb. What is it?
Cress.
What two organisms have not been sequenced despite their massive commercial use due to the fact that their genomes are just too big?
Maize ( 4.8 Gb) and Wheat (17 Gb).
Wheat has not has its genome sequenced because it is far too big. What other reason has prevented its genome being sequenced?
It contains multiple repeats.
The human genome project used the clone by clone approach to sequence the human genome. What method would have been a better choice?
Whole genome shotgun sequencing.
What were some of the reasons that the human genome project chose to use the clone by clone method to sequence the genome rather than whole genome shotgun sequencing like Celera did?
- Easier to prove it was feasible to sceptics.
- Less risk adverse, which was important as it was government funded.
- Assembly easier.
- Could target gaps allowing it to be finished.
- Better suited for diverse nation consortium.
What department donated some of their labs to allow the human genome project to be completed?
The department of energy.
In what year was the first 24bp sequence published?
1973.
When was the first Sanger sequence method published?
1977.
When was Genbank started?
1982.
When was PCR developed?
1983.
When was C elegans genome sequenced?
1998.
What was the problem with the 454 life NGS G520?
Could not scale.
Why did the human genome project need to develop next generation sequencing ( 5 reasons) ?
Sanger sequencing was not an ideal method, for the following reasons:
- Too expensive.
- Too slow.
- Cloning bias.
- Low coverage.
- Only one sequence at a time.
What are the two alternative names for Next Generation Sequencing?
2nd generation sequencing, massively parallel sequencing.
What are thee main advantages of NGS?
- Rapid.
- Cheap.
- No plasmid needed.
First generation sequencing needed to be replaced as the data was not accurate enough to sequence a whole genome. True or false?
False, first generation sequencing was very accurate. It was however too slow as only one lane could be processed at a time and it required expensive chemicals.
What is the main benefit of Illumina sequencing?
More than one sample/ genome can be run at a time.
What are indexs?
Unique 6 base codes that allow the identification of each sample.
What NGS technique allows two time coverage of the genome?
Miseq.
What are the steps for Illumina library preparation?
- Fragmentation.
- Size select (200-500bp).
- End repair with DNAP and exonucleases.
- A tailing.
- Ligation of adaptor.
- PCR amplification.
What is A tailing?
A processes used in Illumina sequencing carried out by tac polymerase. Ensures that the strands are complementary to each other stoping ligation.
What do adaptors have?
A T’ overhang.
What happens during the PCR amplification step in Illumina library preparation?
Low sequences are picked up. Adaptors are extended and index sequences are added to allow samples to be differentiated between.
What can be used to amplify the signal produced by Illumina Library preparation other than PCR?
Fluronucleotides.
How many copies are present per cluster in Illumina bridge amplification?
1000.
How many clusters can be present per cm2 in Illumina bridge amplification?
10 million.
Clustering in Illumina sequencing is a process in which each fragment is ________ . Flow cell is made of a glass slide coated with __________. One _____ is complementary to the ______ region. Polymerase makes this strand double stranded and then ___________. Adaptor now complementary to the other oligo, binds and _______ to create two single stranded DNA molecules. ____ strand is then cleaved and washed off.
Isothermically Lawn of different oligios Oligo Adaptor Original template removed Bridge amplification occurs Reverse.
Why are the 3’ ends blocked in Illumina sequencing?
Prevents unwanted binding.
What type of sequencing is used with Illumina sequencing?
Sequencing by synthesis.
What determines the length of the read in sequencing by synthesis?
The number of cycles.
What determines the base incorporated in sequencing by synthesis?
Emission wavelength and signal intensity.
What is the role of the index 1 primer in sequencing by synthesis?
Allows the index read product to be obtained. This read is completed once the adapter is reached.
What has to happen before Index 2 can be sequenced in sequencing by synthesis?
The 3’ end has to be unprotected.
What happens once the index 2 read has been washed away in sequencing by synthesis?
DNAP makes double stranded molecule, which is then linearised with the 3’ ends being blocked. Originally forward strand cleaved of and washed away. Read two then occurs.
What are the 6 main steps of Illumina sequencing by synthesis?
- Sequencing primer hybridised.
- Polymerase and nucleotides added.
- Flurophores an each cluster read by lasers.
- Cleave flurophores and unblock nucleotides.
- Wash.
- Repeat.
- Index sequencing- primer hybridised.
How much output is given from Illumina sequencing?
An enormous output.
Is sanger or Illumia sequencing more accurate?
Sanger.
Why is Illumina sequencing relatively slow?
As it has a stop start nature.
Does the sample have to be amplified with Illumini?
Yes.
How long is the read length for Illumini sequencing?
Short.
Why would solid state electronics be better than Illumina sequencing by synthesis?
Expensive optics and chemicals.
What does the solid state chip measure in Ion torrent?
pH changes.
What are the 8 steps with Ion torrent?
- DNA fragmentation.
- Size selection.
- End repair.
- Adaptor ligation.
- PCR amplification.
- Emulsion PCR.
- Ion torrent.
- Sequencing.
What causes a change in pH for ion torrent to work?
Once a nucleotide is added a H+ ion is released.
What does the micro reactor contain in emulsion PCR?
Ideally one DNA stand, DNA, primers, one bead and PCR mix.
What are the stages of emulsion PCR (7 steps) ?
- Denaturation of the library fragment.
- Annealing of one reverse fragment to the adaptorsite on the beads.
- Polymerase amplifies the forward strand starting from the beads towards the primersite.
- Denaturation of original reverse strand from the bead.
- Annealing of the reverse strand to the adaptor site of the bead. Primer anneals to the forward strand.
- Polymerase amplifies the forward strand starting at the bead and heading towards the primersite. Reverse strand opposite.
- Repeat for up to 48 cycles.
How does the forward DNA strand connect to the bead in emulsion PCR?
Sugar phosphate backbone of the DNA.
What is the main problem with homopolymers and ion torrent?
Hard to tell the difference between 10, 11 etc of the same base in a row.
Does Illumina or Ion torrent need expensive optics?
Illumina.
What does the ion sensitive layer below the well do in Ion torrent?
Detect changes in pH and convert these changes to voltage, indicating incorporation of that specific nucleotide.
How often are different nucleotides washed over the chips in Ion Torrent?
Every 15 seconds.
How much has the output of Ion torrent increased in the last 2 years?
1000X.
What 6 things would a better sequencing machine do/ include?
- Allow for single molecule incorporation of a sequence without amplification.
- Continuous reads.
- Long reads.
- Solid state electronics.
- Cheap.
- Small fragments.
What are three examples of first generation sequencing?
- Fragmented ladders.
- Sanger sequencing.
- Maxam Gilbert
What first generation sequencing method is no longer used?
Madam Gilbert.
What sequencing method had 74% of the market in 2014?
Illumina.
What generation of sequencing is amplified DNA libraries, clonal arrays and cycling enzymatic reactions an example of?
2nd Generation sequencing.
SOLid is a method of second generation sequencing that has become obsolete. What is an example of a second generation sequencing method that is almost obsolete?
454.
What third generation sequencing method has become obsolete?
Helicos.
What generation of sequencing are PacBio and Oxford Nanopore?
Third.
Name an example of fourth generation sequencing?
Experimental, cellular resolution and positional sequencing.
Why must second generation sequencing be amplified on beads or on a plate?
Ensures that the signal level is above the background noise.
Does second generation sequencing create large libraries?
Yes.
Why is the reaction paused after the addition of a base in second generation sequencing?
Allows the signal to be read.
What are four disadvantages of second generation sequencing?
- Expensive library preparation.
- Slow library preparation.
- Relatively short read length at 100-200bp.
- Bias introduced by PCR.
Why can PCR introduce bias?
GC rich sequences are not amplified as efficiently and ligases prefer certain sequences for ligation.
What sequencing techniques produce very long reads with half the data being over 14,000 bp and the longest reads being 40,000 bp long?
PacBio (Pacific biosciences RS11).
What is the accuracy of PacBio?
99.999%
What is the shortest run time of PacBio?
10 bases per second.
Does GC biases affect low or high GC regions?
Both.
What sequencing method has least GC bias?
PacBio.
Why is there no amplification bias with PacBio?
As the sequences do not need to be amplified.
What can PacBio do which other sequencing methods can not?
Discover a broad spectrum of DNA base modifications.
What adaptors are used with PacBio library preparations?
SMRTBell adaptors.
What is the purpose of SMRTBell adaptors in PacBio?
Ligate blunt hairpins and repair fragment ends.
What are the 6 main steps of the PacBio method?
- Fragment DNA.
- Repair DNA and damaged ends.
- Ligate adaptors.
- Anneal sequencing primer to SMRTBell templates.
What is the 5’ SMRTBell adaptor sequence?
TCTCTCTC.
What is the 3’ SMRTBell adaptor sequence?
GAGAGAGAT.
What does SMRT sequencing stand for?
Single Molecule Real Time sequencing.
Normally when nucleotides are made fluorescent the base is modified. What is modified to make the nucleotides fluoresce in SMRT sequencing?
The terminal phosphate.
What is measured in SMRT sequencing?
The fluorescence emitted when each base is added.
‘Zero mode wavelength chambers’ are used in SMRT sequencing to improve detection as the signals produced are tiny. What are these chambers coated in?
Aluminium and silicon dioxide.
How wide are zero mode wavelength chambers (ZMV)?
70nm.
What are the read lengths produced in SMRT sequencing?
500-3200 bases.
How does SMRT sequencing work?
Nucleotides diffuse in and out of the ZMV chambers every microsecond, when one is incorporated by DNAP it takes several milliseconds. This means the fluorescence label has time to be excited, emitting light that can be detected.
What method in addition to PacBio allows the incorporation of 10bp per second?
SMRT sequencing.
What method of sequencing is cheap and quick enough to potentially improve health care?
SMRT sequencing.
What does a CCS read, produced by SMRT sequencing stand for?
Circular consensus sequence.
What release step is the Oxford Nanopore MinION GridION on?
Beta.
What was sequenced in 2014 by Oxford Nanopore sequencing?
E.coli and Scardovia.
What is the average read length of Oxford Nanopore?
5.4Kb.
Are some reads with Oxford Nanopore bigger than 5kb, 10kb, 15kb or 20kb?
10Kb.
What is the current error rate of Oxford Nanopore?
30-40%.
What are the steps involved in Nanopore library generation?
- Fragment by nebulisation.
- End repair.
- A tailing.
- Ligation of a 1D or a 2D adaptor.
- Conditioning attaches a motor protein.
Is the 1D or the 2D adaptor used in Oxford Nanopore a hairpin shape?
2D.
What protein is used as the pore in nanopore technology?
Heptameric protein a-hemolysin.
Why is a-hemolysin an ideal protein to use in nano pore technology?
It is secreted from bacteria meaning it is low cost and robust.
What does nano pore technology involve?
A synthetic polymer membrane. Current is measured across the pore.
What happens in 1D Oxford nano pore technology?
One stand is sequenced and the other is discarded.
What happens in 2D oxford nano pore technology?
The first strand is sequenced and then the hairpin adaptor is unwound and sequenced. The opposite strand (complementary to the first) is then sequenced and is used to correct any errors allowing the correction of a two direction read.
What is the role of the motor protein in nanopore technology?
Ensures that only one base enters the pore at a time.
How does nano pore technology allow for the identification of molecules?
The membrane has a very high electron resistance and each molecule causes a distinctive disruption in the current allowing for its identification.
What is the role of the Gridiron node in nano pore technology?
Allows for data collection.
Is there a deterioration of accuracy with nano pore?
No.
How does nano pore technology need to be improved?
Read length.
What are 6 advantages of nanpore technology?
- No amplification.
- Rapid.
- Long reads.
- Electronic data in real time.
- Solid state electronics.
- Portable.
- Versatile.
What is an advantage of having electronic data collected in real time in nano pore sequencing?
It means thats sequencing can be stopped once the required sample is obtained.
Nano pore technology is currently the fastest sequencing method available. True or false?
False, it sequences 10bp a minute like Illumina and Ion torrent.
Solid state electronics make sequencing easier as they do not require expensive optics. What else does it do?
Imparts more reliability.
Why are nano pores versatile?
They can potentially be changed to measure RNA, proteins and other compounds.
What are 6 research related applications that NGS can be used for?
- Denovo genome sequencing.
- Resequencing the genome and comparing it to the reference genome.
- Sequencing transcripts (RNAseq).
- Studying methylation of DNA.
- Sequencing small RNAS (sRNAseq).
- Studying protein binding sites (CHIPseq).
What are three clinical applications of NGS?
- Diagnostics.
- Biomarkers.
- Prenatal testing.
When did Illuminas MiSeqDX get FDA approval for diagnostics, assays and biomarkers?
19/11/2013.
What gene has ‘Molecular Health’ been testing for with NGS?
Her2 gene.