Genetic Engineering 2 Flashcards
Sequencing Whole Genomes
- allows you to predict the sequences of gene products necessary for programming
- but just knowing the sequences of a gene or protein does not mean that we can oredict its biological function
- genome sequencing is only of use when combined with biochemical and genetic evidence of gene functions/interactions/regulation
What are the two basic approaches to whole genome sequencing?
i) minimum tiling path - determining the smallest number of overlapping cloned sequences that cover the entire genome
ii) random shotgun sequencing - sequence clones at random and assemble sequences by aligniment with each other
Minimum Tiling Path
1) construct libraries of cloned genomic DNA by shotgun cloning total genomic DNA by restriction/ligation
2) create different libraries but all with BAC clones, large genomic inserts
3) map the restriction enzyme sites in each cloned sequence using restriction digests
4) this gives you a map of the fragments that make up each cloned sequence allowing you to see where they overlap with the other sequences and how they fit together in the genome
5) this allows you to construct a CONTIG, a continuous sequence of the genome made up using the minimum number of overlapping fragments
6) digest the long sequence clones again using restriction enzymes and sequence each fragment by synthesis
7) from the RE mapping you already know how the fragments fit together in the genome so you can just put the fragment sequences together to make up the whole genome sequence
BAC Plasmid
- bacterial artificial chromosome
- used as a vector for larger DNA inserts
- can insert sequnces of ~100kb vs 10-25kb in normal plasmids
Random Shotgun Sequencing
1) construct libraries of cloned genomic DNA by shotgun cloning
2) create different libraries containing different size classes: BAC clones (100kb), plasmids with 20kb inserts and plasmids with 8kb inserts
3) sequence 800 to 1000 bp from either end of each clone sequence at each size class
4) put all of these short sequences into a computer, sets of CONTIGs from smaller size class sequences are linked using sequences from larger insert libraries
5) genome sequence is then assembled using sequence analysis software to overlap all of the individual sequences
Minimum Tiling Path vs. Random Shotgun Sequencing
Minimum Tiling Path
- minimises misassembly
- reduces number of sequencing reactions
- very labour intensive as it requires RE mapping and genetic mapping to assign CONTIGs to physical chromosomes
Minimum Tiling Path vs. Random Shotgun Sequencing
Random Shotgun Sequencing
- requires massive sequencing depth
- difficult to assemble CONTIGs and align them correctly in a large genome
Next Generation Sequencing
1) DNA randomly sheared into ~400bp fragments
2) ligate “adapter” oligonucleotides to each end, these contain a sequence to which a primer can anneal
3) each molecule is denatured and bound to a solid surface via primers immobilised by covalent bonding
4) the fragments are PCR amplified in situ and a DNA polymerisation is initiated by adding DNA polymerase with a labelled dNTP (either G, C, A or T), the dNTP has a fluor blocking the 3’ -OH so once it is added to the DNA molecule, co further nucleotides can be attached
5) a bright light causes the fluor to fluoresce and the position of each signal is recorded using a CCD camera
6) the bond between the incorporated base and the fluor is broken and the fluor is washed of, this restores the 3’ -OH to the newly incorporated base meaning that it is possible for a nucleotide to bind to it
7) another cycle of polymerisation is initiated by adding DNA polymerase and another fluorescent base, the signal is recorded
8) this cycle of single-base polymerisation/fluorescence detection/regeneration is repeated until a sequence of each fragment has been built up
Can you sequence an entire genome with NGS?
- because read lengths are very small, 150bp, it is very difficult to assemble de novo genome sequences of complex eukaryotes that have large amounts of repetitive DNA between coding regions
- but NGS is very useful for sequencing smaller genomes e.g. bacteria
- sequencing is massively parallel
- NGS reads can be aligned with a draft genome sequence for more complicated genomes if one already exists
What does NGS allow us to do?
- assembly of sequences of new organisms (up to a point)
- comprehensive analysis of the transcriptome (sequences of all RNAs expressed in a cell/tissue/organ/organism)
- analysis of changes in gene expression (by observing changes in the transcriptome in response to environmental changes)
- identifying genetic differences between individuals a population
- analysis of evolutionary change
Genome Browsers
Definition
-allow users to scan genome sequence data for gene coding sequences
Eukaryotic Genes Structure
Exons = regions present in the mRNA, made up of translated and non translated regions Introns = regions not present in the mRNA, removed by splicing
Eukaryotic Genes
Primary Transcript
exons, introns and poly A signal added by polyadenylation of the 3’ end of the molecule
Eukaryotic Genes
mRNA
exons and poly A signal
cDNA Synthesis
1) start with mRNA
2) short poly-T primer annealed to the poly-A tail on the mRNA
3) reverse transcriptase (an RNA dependent DNA polymerase) copies mRNA as a single stranded cDNA
4) DNA ligase ligates a linker primer to the single stranded cDNA
5) DNA polymerase (a DNA dependent DNA polymerase) can synthesise a second strand to form double stranded cDNA
What does sequencing cDNA allow us to do?
- align RNA sequence to the genomic sequence to identify exons and introns to correct the gene sequences in genomes
- estimate the level of gene expression
- directly estimate RNA abundance for all genes in a tissue sample
The Polymerase Chain Reaction
Definition
- allows selective amplification of any gene from a genome without the need for cut-paste-screen cloning provided that sufficient sequence information is available
- based on properties of DNA polymerase - copying of template strand by extension of a primer sequence annealed to it
The Polymerase Chain Reaction
Typical PCR Cycle
1) denature DNA at 95C for 30s so that it is single stranded
2) anneal gene specific primer at 45-60C for 30s
3) DNA polymerase reaction, for 1 to 2 mins
4) repeat steps 1 to 3 35 times
PCR Cycles
- each DNA replication cycle doubles the quantity of DNA
- typically around 30 cycles will give you amplification of microgram quantities of the desired fragment from nanogram quantities of total genomic DNA
Duration of Polymerase Reaction
-the length of the copied from the primer depends on how long you leave the reaction to take place
Thermostable DNA Polymerase for PCR
- E.coli DNA polymerase works optimally at 37C and is rapidly inactivated at higher temperatures
- bacteria that live in high temperature environments have thermostable enzymes
- the most common DNA polymerase used for PCR is Taq. polymerase from the bacterium Thermus aquaticus
- Taq polymerase survives the 95C denaturation stage and works optimally at 72C in the extension of the primers
Uses of PCR - DNA Fingerprinting
PCR Cycle
- the human genome composes regions of DNA sequence that are hypervariable in length between individuals at particular loci but are very close to/imbedded in sequences that are highly conserved in everyone
- primers are used that anneal to the highly conserved sequences flanking the hypervariable regions
- this allos the variable regions to be amplified for analysis
Uses of PCR - DNA Fingerprinting
DNA Sequences
-many genes contain short elements in their non-coding regions made up of simple sequence repeats e.g. ATATAT or GCGCGC
-these sequences are highly susceptible to insertion/deletion of bases due to DNA polymerase slippage during DNA replication
-where there is no selection for sequence conservation (as in non-protein coding regions) these sequences may exhibit high levels of polymorphism between individuals in a species
-these simple repeats of repeated sequences are known as
vNTRs - variable number tandem repeats
SSRs - simple sequence repeats
or microsatellites
Uses of PCR - DNA Fingerprinting
Identification
- in the human population no single hypervariable sequence is unique enough to ID an individual
- but by multiplexing a number of the primer pairs specific for different parts of the genome we can generate a pattern of bands derived from independent loci that are specific to individuals
- e.g. the FBI uses 13 different loci for identification
- but if two individuals are related they are more likely to be more similar
What are the uses of PCR based DNA fingerprinting?
- paternity testing
- forencsic testing
Multiple Uses of PCR
- direct gene isolation without cloning
- comparison of gene sequence between individuals in a population
- identification of different alleles of a gene in a population
- identification of contaminants in ‘pure’ preparations
- tracing descent of individuals in a population
- creation of specific mutations in genes
PCR and Haemoglobin Genes
-sequence analysis of PCR-amplified haemoglobin genes from different groups shows how species have diverged during evolution