High Throughput methods Flashcards
What is the major challenge in sequencing a genome
Assembling thousands of sequenced segments into contiguous blocks(contigs) and assigning them to their proper positions in the genome
What are two efficient methods used to sequence a genome
Map-based genome sequencing and the whole genome shotgun assembly strategy
What is a genomic library
A method that clones an organisms entire genome as DNA fragments and then researchers can identify the clone(s) containing the sequence(s) of interest. A genomic library can also be made for mRNA by reverse transcribing it into cDNA.
Describe map-based genome sequencing
This strategy involves preparing low resolution physical maps of each chromosome by identifying shared landmarks on overlapping 250kb inserts that are cloned in YACs. A YAC library is created. These landmarks take the form of 200-300bp segments called sequence-tagged sites(STSs)- their sequence occurs nowhere else in the genome. So, two clones that have the same STS must overlap. The STS-containing inserts are then randomly fragmented into 40kb segments and then are subcloned into cosmid vectors so that their landmark overlaps can be identified to create a high resolution map. A cosmid library is created. The cosmid fragments are then randomly fragmented into overlapping 5-10kb or 1kb inserts and cloned in plasmids or M13 vectors(shotgun cloning). A plasmid or M13 library has been created. Then these inserts( around 800 M13 clones per cosmid) are then sequenced and the resulting reads are assembled computationally into contigs to yield the sequence of their parent cosmid insert. Finally, the cosmid inserts are assembled through cosmid walking using their landmark overlaps to yield the sequences of the YAC inserts which are then assembled using STSs to yield the chromosome sequence. The genomes of complex eukaryotes contain many repetitive sequences and this causes difficulty because it leads to gaps in the sequence.
What are the differences between map-based sequencing and WGSA
WGSA is a more straightforward sequencing procedure and it eliminates low resolution and high resolution maps. It is faster and less expensive
Explain whole genome shotgun assembly strategy
A genome is randomly fragmented, a large number of cloned fragments are sequenced and the genome is assembled by identifying overlaps between pairs of segments. Without a genetic map of the genome being sequenced, the order of the contigs and their orientations would be unknown.
For bacterial genomes, tens of thousands of fragments would be sequenced and assembled computationally. Then a task known as finishing, involves filling in the gaps between contigs using several techniques, such as synthesizing PCR primers complementary to the ends of the contigs and using them to isolate the missing segments( chromosome walking). Bacterial genomes have few repetitive sequences.
For eukaryotic genomes, they are greater in size compared to bacterial genomes so the WGSA is carried out in stages. A BAC library of 150 inserts is generated. The insert in each of the BAC clones is identified by sequencing 500bp in from each end to yield segments called sequence-tagged connectors(STCs or BAC-ends). The BAC inserts are then fragmented and then shotgun cloned into plasmid or M13 vectors. These fragments are then sequenced and assembled into contigs. The sequence of the BAC is compared with the database of STCs to identify the 30 overlapping BAC clones. The two with minimal overlap at either end are then selected, sequenced, and then the operation is repeated until the entire chromosome is sequenced(BAC walking)- for the human genome is 27 million sequencing reads. This process is also confounded by repeating sequences.
How are mistakes in WGSA eliminated
By finishing it through the use of some of the techniques of the map-based strategy.
List the major observations that have been made about the human genome
> 45% of the human genome is repeating sequences of various lengths
28% is transcribed to RNA
1.2% encodes protein
Contains 23000 protein encoding genes- also known as open-reading frames
Only a small fraction of human protein families are unique to vertebrates, most occur in other life forms
Two randomly selected human genomes differ by only 1 nucleotide per 1000- that’s 99.9% similarity between individuals
Explain the 454 sequencing system( next generation DNA sequencing technology)
The genomic DNA is randomly sheared to small( 300-500bp) fragments and ligated to adaptors, which are bound by 30um-in-diameter “DNA capture” beads under dilution conditions that one DNA fragment is bound to each bead. A genomic library has been created. The beads are suspended in a PCR mixture containing dNTPs, primers complementary to the adaptors, and Taq DNA polymerase. The suspension is emulsified with oil such that one droplet contains 1 bead. Each bead has its own microreactor that prevents competing or contaminating sequences from entering. The PCR is carried out by thermocycling until around 10 million identical fragments are bound to each DNA capture bead. Isopropanol is added and the emulsion is broken, DNA is denatured and the single-stranded DNA-bearing beads are deposited into 75-picoliter wells on a fiber-optic slide with one bead per well. The slide contains 1.6 million wells. This is clonal amplification. The DNA on each bead is sequenced using a series of coupled enzymatic reactions that are known as pyrosequencing. This is DNA sequencing.
Explain the applications of next generation sequencing
It sequences:
> the whole genome- identifies structural variants, point mutations and copy number variation
> whole exome - identifies point mutations and copy number variation
> PCR amplification- identifies point mutations and deletions
> transcriptome RNA- identifies gene expression, gene fusions and splice variants
> exome capture transcriptome- identifies gene expression, gene fusions and splice variants
Name some advantages of NGS
It generates a more affordable human genome sequence, allows for the comparison of many human genome sequences, determines correlation of specific sequences with susceptibility of diseases and can be used to create personalized medicine
What is personalized medicine
It is a type of medical care in which treatment is customized for an individual patient. It uses information from an individuals genes, proteins and environment to prevent, diagnose and treat disease.
Why do researchers want to move toward personalized medicine for all
It moves away from the ‘one size fits all’ approach and focuses on the individual patient. It uses new approaches to target therapies to achieve the best outcome in the management of the individuals disease or predisposition to a certain disease
Describe southern blotting
Southern blotting or the southern transfer technique is used in the identification of specific DNA sequences. It uses the fact that nitrocellulose binds ssDNA . dsDNA goes through gel electrophoresis and the gel is soaked in 0.5 M NaOH solution to produce ssDNA. The gel is then overlaid by nitrocellulose paper, a thick layer of paper towels and a heavy plate.. This forces(blots) the liquid in the gel through so that the ssDNA binds to the paper. The transfer of ssDNA to nitrocellulose paper can also be done by electroblotting. The nitrocellulose paper is then dried at 80 degrees to permanently fixed ssDNA to the paper. The paper is then moistened with a solution containing 32P-labeled ssDNA or RNA (this acts as the probe) that is complementary to the DNA sequence of interest. The moistened sheet is then kept at a suitable renaturation temperature for several hours which allows the probe to anneal to the specific DNA sequence(s). The sheet is then washed to remove any unbound radioactive probe, dried, and then autoradiographed by placing it for a time on a sheet of X-ray film. The positions of the molecules that are complementary to the radioactive sequences are indicated by the blackening of the developed film. This is how a specific DNA sequence may be detected and isolated.
What are the different types of probes that can be used in southern blotting
DNA, RNA or mRNA(if it is produced in sufficient quantity to be isolated)