GEN 3: Defining the Genome II - DNA Flashcards
List some methods used to analyse DNA and when they were first developed
- Sanger Sequencing: 1970s
- Restriction enzymes: 1970s
- DNA cloning with Restriction Enzymes: 1972
- Southern Blotting: 1975
- Polymerase Chain Reaction: 1985
- Next generation sequencing (NGS): 2006
Describe how Sanger sequencing works
- Sanger sequencing requires:
- the target DNA
- an oligonucleotide primer of ~20nt complementary to part of that DNA
- a DNA polymerase
- extends the primer, using the target DNA as template until ddNTP is added and terminates extension
- a mixture of deoxyribonucleotide triphosphates (dNTPs) and dideoxynucleotide triphosphates (ddNTPs)
- the ddNTPs lack the 3’-OH group required for nucleotide chain extension
- fragments are separated by polyacrylamide gel electrophoresis or capillary electrophoresis, which distinguishes fragments differing in length by only one nucleotide
- labelling ddATP, ddCTP, ddGTP, ddTTP with different fluorophere produces coloured peaks that provide a direct read of nucleotide sequences up to ~1000 nucleotides long
Describe how restriction enzymes work to analyse DNA
- restriction enzymes are endonuclease enzymes that cut double-stranded DNA at specific sequences
- they cleave phosphodiester bonds to leave free terminal 3’-OH and 5’-phosphate groups
- they are able to cleve internal bonds and circular DNA, unlike exonucleases which only cleave bonds at DNA ends
- these enzymes usually recognise short target sequences of 4 to 8 base pairs
- they can cut DNA into smaller fragments by targeting their specific restriction sites
- there are two types of restriction digestion
- restriction enzymes come from bacteria and is named after the species of bacteria from which it derives
Describe how DNA cloning with restriction enzymes works
- DNA is inserted into a plasmid
- to do this, both the DNA and plasmid are digested with the same restriction enzymes
- this produces complementary sticky ends
- DNA ligase then ligates the two together
- the resulting recombinant plasmid is then introduced bacteria to generate a single colony (or clone) of bacterial cells, each carrying the same recombinant plasmid
- this is called recombinant DNA cloning or molecular cloning
What was DNA cloned with restriction enzymes used to create?
- it was used to create a genomic DNA library
- genomic DNA was fragmented into millions of small pieces
- these were ligated into a plasmid vector and introduced into bacteria, so that each individual clone carried a different genomic DNA fragment
- Sanger sequencing of each member was then used to assemble the sequence of the whole genome
Describe how Southern blotting analysed DNA
- mixtures of DNA fragments are separated by electrophoresis through an agarose gel and blotted onto a nylon membrane
- a specific sequence in the mixture can then be detected using a DNA probe that is radioactively or fluorescently labelled
Using sickle cell disease, describe how Southern blotting is used to help determine the status of the beta-globin gene
Describe how PCR works as a way to analyse DNA
- a pair of oligonucleotide primers is designed that flank the region to be amplified and are complementart tro opposite strands
- reactions contain template DNA, the chosen primer pair, dNTPs and a thermostable DNA polymerase (Taq polymerase)
- using a programmable temperature block, the PCR reaction is taken through multiple cycles of temperature incubations
- see image for why PCR is used
What’s the amplification factor in PCR?
- in principle, the target sequence is duplicated during each PCR cycle
- if the PCR runs for n cycles, this results in an amplification factor of 2n
- this doesn’t happen in practice, but it is still powerful
What is Next Generation Sequencing (NGS)?
- NGS methods can sequence million of short DNA fragments simultaneously
- known as massively parallel sequencing
- without the need for individual fragment isolation
Describe how Illumina, an NGS method, works
- the DNA is first broken into short (<250nt) fragments that are tagged and hybridised onto oligonucleotides attached to a solid support called a flow cell
- the bound fragments are PCR amplified in situ (bridge amplification)
- generating millions of distinct clusters, each derived from a single fragment
- clusters are then sequenced in parallel (at the same time) by primer-extension, one nucleotide at a time, using dNTPs that are reversibly modified with 3’-end blocks and fluorescence tags
- after the addition of the first nucleotide, the flow cell is laser-scanned to measure the position and colour of each cluster
- the information is then stored digitally
- the flow cell is then treated to remove the fluorescent tags and 3’-end blocking groups on the newly extended primers
- this process is then repeated enough times to generate sequences reads for each cluster
- bioinformatics software is then used to compare sequence reads, to identify any overlaps and so assemble the sequence of the starting DNA
What are some applications of NGS?
- whole genome sequencing (WGS):
- allows sequence variation between individuals to be compared
- transcriptome sequencing (RNA-seq):
- sequencing of DNA reverse transcribed from RNA transcripts
- the most highly expressed genes give the greatest number of sequence ‘reads’
- targeted sequencing:
- a small region of the genome is sequenced in samples where there may be many variants
- e.g. exome sequencing may reveal protein-coding variations
- ChIPseq:
- antibody to a protein of interest is used to purify chromatin containing that protein, prior to WGS
- reveals protein-genome interactions
Referring to the illustration for Sanger sequencing, why does the incorporation of the first ddNTP (ddGTP) not prevent any further primer extension?
- the reaction contains many primed templates as well as a mixture of dNTPs and ddNTPs (the former being excess)
- at each position in the sequence, therefore, only a small proportion of the primers have their 3’ ends blocked
- the majority will have unblocked 3’ ends and so will be extended
It is important to optimise the temperature at step 2 of PCR (annealing the primer)
Can you predict the consequences of step 2 temperatures that are
a) too high
b) too low ?
a) too high:
- primer hybridisation would be impaired so no PCR products would be obtained
b) too low:
- primers would bind with less specificity and may give rise to spurious PCR products
At each sequencing step during Illumina NGS, what must happen in between laster scanning of the flowcell and addition of the next nucleotide?
- the 3’-blocks and fluorescent tags must be removed
How many nucleotides is the human haploid genome composed of?
- 3 billion nucleotides
What percentage of the human haploid genome are genes and gene-related DNA?
How many protein-coding genes are there?
What are some gene-related DNA examples?
- 37.5%
- there are about 21,000 protein-coding genes
- most of their DNA is non-coding
- introns, UTRs
- non-protein-coding-genes make RNAs with known and unknown functions
- rRNA, tRNA, miRNA, some lncRNAs
- other non-coding DNA is known to be gene-related but lacking function
- pseudogenes
- gene fragments
How much of the human genome is made up of highly repeated DNA?
What is it made up of?
- 54%
- 1740Mb
- it is made up of dispersed transposable elements and tandemly repeated DNA
Observe this diagram for the breakdown of the human genome constituents
How is most of the DNA in protein-coding gene not coding?
- there are non-coding regions such as:
- 5’ and 3’ untranslated regions (UTRs)
- enhancer sequences
- promoter sequences
- long introns