Human genome sequencing. Flashcards
What is the definition of whole genome sequencing?
The complete genome sequence at the same time, including nuclear, mitochondrial and chloroplast DNA (where applicable.)
Whole genome sequencing is different from DNA profiling, what is DNA profiling?
DNA profiling determines the likelihood that genetic information comes from an individual or a group.
How long did it take to sequence the first genome?
13 years.
How much did it cost to sequence the first genome?
$ 3.8 billion.
The first genome took 13 years to sequence, it is now possible to sequence hundreds in a matter of weeks. How much would this now cost?
$300,000.
What would the perfect sequencer result in?
Instantaneous and unprocessed samples bing produced.
What are 6 challenges that genome sequencing still faces (both NGS and first generation) ?
- Nucleic acid extraction.
- Sub fractional size selection.
- Separation of molecules into individual positions.
- Amplification of the signal.
- Reading the signal.
- Data analysis.
Genome sequencing still faces multiple problems, what one of these problems has almost been resolved?
Extraction of the nucleic acid.
What was the first method used to sequence the genome/
Sanger sequencing.
Sanger sequencing used to be able to sequence 300bp at a time, how many can it sequence now?
1000bp. This however still isn’t a whole chromosome.
Most aspects of genome sequencing are considerably cheaper now, what aspect is still expensive?
Sanger sequencing.
In what year was the human genome project started?
1990.
What did the first phase of the human genome project involve?
The creation of genetic and physical maps of human and mice.
What two organisms were sequenced at the start of the human genome project and what sizes were their genomes?
The worm (100mb) and yeast (12mb).
When was the first draft human genome sequence created?
1997-2000.
The first draft of the human genome was mostly correct, however what did it contain?
Many gaps and errors.
When a genome has a certain amount of errors it is classed as complete. True or false?
False, there is no distinct limit. Most genomes are not as complete as the human genome, however there are some which are more complete.
How many countries and US labs were involved in the human genome project?
18 countries and US 200 labs.
Where was most of the human genome sequenced?
In the welcome trust in the UK.
What is a genetic map?
The order of genetic mapping markers and the genetic difference between them based on recombination frequency. Distance measured in centimorgans.
What do genetic maps rely on?
Sequence variation between parents and individuals of 300bp.
What are genetic maps mostly based on?
PCR to determine restriction fragment length polymorphisms, mini satellites and micro satellites.
What is a physical map?
The actual location of DNA sequences in a genome.
Is a physical or genetic map more useful?
Physical.
What are the four steps of genome sequencing through the clone-by-clone method?
- Extract DNA.
- Fragmentation of DNA.
- Size selection of DNA.
- Cloning of 100-200 Kbp fragments into BACS, YACS or PACs to create a genome library.
What are the three main methods of fragmenting DNA?
Physical, enzymatic and chemical.
What are two examples of physical methods to fragment DNA?
Sanitation and nebulisation (hydrodynamic sheering).
What do enzymatic methods of DNA fragmentation involve?
Restriction enzymes and transposases.
What are two examples of chemical methods used for DNA fragmentation?
Heat and divalent cations such as Zn2+ and Mg2+.
What methods of fragmentation are most often used with mRNA?
Chemical.
What do YACS include?
Yeast centromere, telomere and linear insert.
What was the first method used for the creation of a gene library in human genome sequencing and what was the disadvantage of this method?
YACS which allowed for recombination with other parts of the yeast genome.
What method of genome fragmentation create preferred random fragments?
Physical.
What does fragmentation with restriction enzymes result in?
An incomplete digest with unevenly distributed fragments.
Although restriction enzymes result in unevenly distributed fragments what beneficial feature do they also result in?
Fragments with undamaged sticky ends.
Inserts can be up to 2000bp. How many base pairs can Sanger sequencing accurately sequence?
500-1000bp.
What does a clone contain stand for?
A continuous set of clones.
What can a super contig also be called?
A scaffold.
Once you have seen if your fragments contain the markers what can you do?
Design primers to sequence the DNA.
If A BAC contains the end sequence of another BAC what can you assume?
That they are found next to each other.
Can super contains include gaps?
Yes.
What are the main steps of shotgun sequencing BAC clones?
- Retrive 40-1000 kb of DNA from BAC clone.
- Break up DNA into 5-10kb fragment.
- Use universal primers to sequence insert.
Why did the human genome project not ‘walk along’ the DNA with primers when shotgun sequencing the bas clones?
As it is a very expensive and time consuming method. Can only do 1000bp a day.
What do you need to sequence to assemble large fragments in shotgun sequencing of BACS?
Lots of paired sequences. You can not sequence the middle section however.
When the human genome project was originally completed what coverage of the human genome was originally desired?
5 times coverage.
When the human genome was first sequenced 5 times coverage was aimed for. What coverage is now aimed for?
30 times coverage.
What does BAC need sequencing by sanger sequencing produce?
Super contigs and scaffolds.
What order is correct?
BACS, sanger, contigs, shotgun.
BACS, shotgun, contigs, sanger.
BACS, sanger, contigs, shotgun.
What profitable organisation wanted to patent the human genome?
Celera genomics.
Who was responsible for wanting to patent the human genome?
Craig Venter.
What percentage of the human genome was completed before Celera genomics decided to also try and complete the human genome sequence?
5%-10%.
What was Celera Genomics approach to completing the human genome?
Size select the DNA and then clone 10, 20 and 50 Kbp fragments to create a library. These were assembled into a consensus sequence and into contigs which could be sequences automatically by AB13700.
What fold coverage did Celera Genomics aim for?
5 Fold.
Why were gas found in the human genome when it was originally sequenced?
Cloning bias from restriction sites not being evenly distributed. Fragments were also not evenly inserted.
What techniques were used to minimise gaps in the human genome?
The use of different restriction enzymes and different physical and chemical fragmentation methods.
Why in some rare occasions were DNA inserts unstable in E.coli?
The insert could contain a lethal gene to E.coli.
For an unknown reason two types of vectors worked better in sequencing the human genome. What were these two?
BACS and PACS. YACS did not work as well.
What were the two types of gaps that were present in the draft human genome sequence copy and what were the difference between these?
Sequencing gaps have the sequence present. Physical gaps the sequence is absent from the gene libraries.
How would you close a sequencing gap?
Design a new sequencing primer based on end sequences and Sanger sequence from both ends. Fragment can be larger than 1kb but the whole process is very slow.
What do you know regarding physical gaps?
The order of the scaffolds but not the sequence in-between.
How would you close a physical gap?
- Amplify the DNA between the gap by PCR.
- This DNA can be further amplified though clones as PCR can only amplify 3kb.
- Sequence the PCR product or clone into a plasmid vector and end sequence these. Will be able to determine what fragments are by the end sequences.
If you do not know the order of the scaffolds how can you determine what primers to use to fill any gaps?
Try EVERY possibility of primers and look for a PCR reaction product using genomic DNA as the template. An algorithm can then be used to determine the minimum combination in which primers are adjacent.
How much of the human genome is made of repetitive DNA sequences?
45%-50% of the human genome.
What are 6 examples of repetitive DNA sequences found in the human genome?
- Minisatellites.
- Microsatellites.
- Centromeres.
- Telomeres.
- Transposons.
- Duplicated genes.
What type of relative DNA sequence can be a problem in genome sequencing?
Duplicated genes.
What part of the genome is hard to assemble meaning it is the last part to be resolved?
The repetitive parts.
What do many poor quality genomes never get resolved and why?
Repetitive parts as they are expensive to resolve and rarely contain genes.
What two things can repetitive sequences cause?
Truncations and rearrangements.
What types of repeats often cause truncations?
Tandem repeats.
What free factors influenced the species chosen to have their genomes sequenced?
- If they were genetic models.
- Their commercial and medical relevance.
- Their genome density and genome size.
What is the scientific name for a mouse?
M. musculus.
What four organisms were sequenced due to their commercial and medical relevance?
- H. sapiens.
- Oryza Sativa (rice).
- P. falciparum.
- Haemophilus influenza (representative bacterial disease.)
Arabidopsis thaliana has the smallest plant genome at 100mb. What is it?
Cress.
What two organisms have not been sequenced despite their massive commercial use due to the fact that their genomes are just too big?
Maize ( 4.8 Gb) and Wheat (17 Gb).