Genetic Engineering 2 Flashcards
Sequencing Whole Genomes
- allows you to predict the sequences of gene products necessary for programming
- but just knowing the sequences of a gene or protein does not mean that we can oredict its biological function
- genome sequencing is only of use when combined with biochemical and genetic evidence of gene functions/interactions/regulation
What are the two basic approaches to whole genome sequencing?
i) minimum tiling path - determining the smallest number of overlapping cloned sequences that cover the entire genome
ii) random shotgun sequencing - sequence clones at random and assemble sequences by aligniment with each other
Minimum Tiling Path
1) construct libraries of cloned genomic DNA by shotgun cloning total genomic DNA by restriction/ligation
2) create different libraries but all with BAC clones, large genomic inserts
3) map the restriction enzyme sites in each cloned sequence using restriction digests
4) this gives you a map of the fragments that make up each cloned sequence allowing you to see where they overlap with the other sequences and how they fit together in the genome
5) this allows you to construct a CONTIG, a continuous sequence of the genome made up using the minimum number of overlapping fragments
6) digest the long sequence clones again using restriction enzymes and sequence each fragment by synthesis
7) from the RE mapping you already know how the fragments fit together in the genome so you can just put the fragment sequences together to make up the whole genome sequence
BAC Plasmid
- bacterial artificial chromosome
- used as a vector for larger DNA inserts
- can insert sequnces of ~100kb vs 10-25kb in normal plasmids
Random Shotgun Sequencing
1) construct libraries of cloned genomic DNA by shotgun cloning
2) create different libraries containing different size classes: BAC clones (100kb), plasmids with 20kb inserts and plasmids with 8kb inserts
3) sequence 800 to 1000 bp from either end of each clone sequence at each size class
4) put all of these short sequences into a computer, sets of CONTIGs from smaller size class sequences are linked using sequences from larger insert libraries
5) genome sequence is then assembled using sequence analysis software to overlap all of the individual sequences
Minimum Tiling Path vs. Random Shotgun Sequencing
Minimum Tiling Path
- minimises misassembly
- reduces number of sequencing reactions
- very labour intensive as it requires RE mapping and genetic mapping to assign CONTIGs to physical chromosomes
Minimum Tiling Path vs. Random Shotgun Sequencing
Random Shotgun Sequencing
- requires massive sequencing depth
- difficult to assemble CONTIGs and align them correctly in a large genome
Next Generation Sequencing
1) DNA randomly sheared into ~400bp fragments
2) ligate “adapter” oligonucleotides to each end, these contain a sequence to which a primer can anneal
3) each molecule is denatured and bound to a solid surface via primers immobilised by covalent bonding
4) the fragments are PCR amplified in situ and a DNA polymerisation is initiated by adding DNA polymerase with a labelled dNTP (either G, C, A or T), the dNTP has a fluor blocking the 3’ -OH so once it is added to the DNA molecule, co further nucleotides can be attached
5) a bright light causes the fluor to fluoresce and the position of each signal is recorded using a CCD camera
6) the bond between the incorporated base and the fluor is broken and the fluor is washed of, this restores the 3’ -OH to the newly incorporated base meaning that it is possible for a nucleotide to bind to it
7) another cycle of polymerisation is initiated by adding DNA polymerase and another fluorescent base, the signal is recorded
8) this cycle of single-base polymerisation/fluorescence detection/regeneration is repeated until a sequence of each fragment has been built up
Can you sequence an entire genome with NGS?
- because read lengths are very small, 150bp, it is very difficult to assemble de novo genome sequences of complex eukaryotes that have large amounts of repetitive DNA between coding regions
- but NGS is very useful for sequencing smaller genomes e.g. bacteria
- sequencing is massively parallel
- NGS reads can be aligned with a draft genome sequence for more complicated genomes if one already exists
What does NGS allow us to do?
- assembly of sequences of new organisms (up to a point)
- comprehensive analysis of the transcriptome (sequences of all RNAs expressed in a cell/tissue/organ/organism)
- analysis of changes in gene expression (by observing changes in the transcriptome in response to environmental changes)
- identifying genetic differences between individuals a population
- analysis of evolutionary change
Genome Browsers
Definition
-allow users to scan genome sequence data for gene coding sequences
Eukaryotic Genes Structure
Exons = regions present in the mRNA, made up of translated and non translated regions Introns = regions not present in the mRNA, removed by splicing
Eukaryotic Genes
Primary Transcript
exons, introns and poly A signal added by polyadenylation of the 3’ end of the molecule
Eukaryotic Genes
mRNA
exons and poly A signal
cDNA Synthesis
1) start with mRNA
2) short poly-T primer annealed to the poly-A tail on the mRNA
3) reverse transcriptase (an RNA dependent DNA polymerase) copies mRNA as a single stranded cDNA
4) DNA ligase ligates a linker primer to the single stranded cDNA
5) DNA polymerase (a DNA dependent DNA polymerase) can synthesise a second strand to form double stranded cDNA