history of molecular methodology Flashcards

1
Q

Before 1960’s

A

Different and better methods have developed over time.

Before the 1960’s there was no way to study species molecularly and scientists had to study them by phenotype and morphology alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Gel electrophoresis of Allozymes (1966)

A

Allozymes = enzymes with different forms

This method was developed by Lewontin & Hubby in 1966

Allowed for genetic screening across populations

It led to extensive screening for diversity and the observation that there was much more natural variation than had been expected

The process was carried out using starch gel or (originally) cellulose acetate

Gel electrophoresis diagram: Samples are crushed up in a buffer and added to wells in the gel then apply an electrical current that caused proteins in the gel to migrate according to their size.

To screen for a specific allozyme, the enzymes relevant substrate can be added to the buffer mixture to allow the enzyme to do its job. By adding a reactive dye to the mix it is possible to see if the enzyme is doing its job – if it is working it will produce a visible dye.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reading gel electrophoresis of alloenzymes and issues with this method

A

Reading a one locus gel
- Monomer = a protein made up of one poplypeptide.
- If one blob is visible the allele is homozygous.
- If it is heterozygous there will be two blobs
- Enzymes can be made up of one or more polypeptides, and this affects the profile for the heterozygote in the gel.
- Dimers and tetramers = proteins made of two or 3 polpeptides
It is still quite easy to tell a homozygote from a heterozygote in dimers and tetramers in those with a single locus

Reading gels with two or more loci:
- When there are two or more loci involved various heterozygote forms can be quite hard to identify even though the homozygote remains simple.

Issues with gel electrophoresis of alloenzymes:

The triplet codon redundancy – the first couple of base pairs tend to define the amino acid that is formed, the last base pair often can be changed without affecting the AA that is coded for. Because of this a lot of variation in the allozymes are not visible on a gel electrophoresis as they are not expressed in the protein.
Also moving gene by gene they could not collect very much information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

New idea: 2D gel electrophoresis (~1970’s)

A

People began running multiple proteins through a single gel and when it was done rotating it and running it again

This resulted in a complex series of points which people would attempt to identify

This method was given up quickly – it was just too complicated

Finding a way to extract and analyse DNA by itself was essential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Steps to working with DNA by itself
1) extract
2) electrophoresis
3) restrict
4) southern blot
5) probe

A

1) Extract the DNA (traditional method):

  • Digest away all the proteins/cellular material
    SDS is a detergent that breaks down the cell
  • phenol wont mix with the aqueous solution, it bonds to the other materials (proteins, carbs etc.)
  • DNA remains in the aqueous phase
  • Centrifuge to separate the aqueous and phenolic sections
  • Precipitate the DNA using salt and 100% ethanol

2) Electrophoresis of DNA fragments
- All DNA molecules have the same charge
- Mobility in the gel is related to the size of the fragment
- DNA travels towards the positive charge – smaller fragments travel faster
- Percentage of agarose in the gel determines the largest size fragment you can resolve in the gel if all fragments are maximum size or bigger you will see a single band

3) Restriction
- When cut with a restriction enzyme two bands will be visible on the same agarose % gel
- Restriction enzymes recognise a particular position in the genome – a 4,6 or 8 bp long sequence of bases and it cuts there – always in the same way
- e.g. EcoR1 derived from E. coli, these enzymes are naturally present and act as protection from viruses so that when the bacterium is infected it can cut up the viral DNA
- EcoR1 6bp recognition site a ‘pallindrome’ reads the same in each direction. This restrictive enzyme cuts to leave ‘sticky ends.
- Restrictive enzymes always cut in the same place which makes them reliable and predictable
- Any mutation in the genome in the area that this particular restriction enzyme acts would result in no cuts being made and therefore it would be possible to identify mutations in individuals

4) Southern Blotting
- Invented by a man with the surname Southern
- Makes the DNA on the gel visible
- You take the gel that you have run your DNA through and place a filter on top to absorb the DNA
- Absorbent paper on top and buffer on the bottom with a wick underneath to maintain moisture
- A weight on top brings all these materials into close contact so the process works
- This pulls the DNA up, out of the gel and onto the filter

^ From this a compass of blotting was developed – nomenclature inspired by Southern’s name

5) Probes
- Finally utilise probes to view specific pieces of DNA
- Bacterial plasmids are engineered by using the same restriction enzyme used to cut the DNA to open the plasmid at the same point so that the DNA can be inserted into the plasmid
- Inserted by ligation – cut and stuck together using ligase
- This edited plasmid is then inserted into a bacterial cell where it is replicated as the bacterium divides producing multiple copies of the section of DNA that is being studied
- The DNA can then be extracted from these bacteria and labelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Restriction Fragment Polymorphism comparison

A

Labelling DNA fragments used to be done by radioactive tags –this process took advantage of the fact that DNA polymerase (which copies DNA) only works on a single-strand/ double strand boundary. So a short section of DNA (oligonucleotide) is applied to a single strand section of DNA generated by heating up double stranded DNA to cause it to separate. The primer is then hybridised on and the polymerase makes a copy but radioactivity has been introduced.

Nowadays we use biotin labels or other non-radioactive tags for probes. The filter is then exposed to the labelled probe to create banding patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

PCR – Polymerase Chain Reaction (1985)

A

Invented by Kerry Mullis et al. (1985)

  • Process depends on oligonucleotides
  • Also depended on thermocycling to denature the DNA to separate it into single strands so that the primer can attach and extend it
  • but you need a heat resistant DNA polymerase as it takes temperatures of ~95 degrees centigrade to denature DNA
  • Most enzymes are denatured at ~65 degrees centigrade

A suitable enzyme was found in American Hotspring dwelling bacteria Thermus aquaticus
^Enzyme known as Tac polymerase

Key paper:
Enzymatic Amplification of ß-Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia Randall K.Saiki, Stephen Scharf, Fred Faloona, Kary B.Mullis,Glenn T. Horn, Henry A. Erlich, NormanArnheim. Science 1985 230: 1350–1354

See year 1 Genetics: Lecture 10 PCR

> > A real game changer in DNA study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Microsatellite DNA Loci analysis

A
  • Depended on PCR
  • Amplifying them identifies homo or heterozygote form
  • Will only ever have one or two bands (single loci) so much easier to read than satellite data

^This replaced DNA fingerprinting for identifying an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DNA sequencing by chain termination (1970’s)
AKA Sanger sequencing

A

Invented by Fred Sanger (1970’s)

Required 4 reactions (1 for each base?)

1) Put the 4 base pairs into this reaction
2) In each tube run in separate lanes there’d be one base pair with a proportion of dideoxynucleotide so that the first time you came to that base pair it would drop off
3) By only adding a proportion you create multiple lengths
– e.g a bit dropping off every time a T is present
(see diagram in notes)
Every band drop is at a T and then G and so on.

Technology has now been developed to detect the dyes for each base as they occur and records them as peaks

From this method you could read off a sequence of up to 2000 bp at a time

This method is still used as it is accurate and often provides enough bp to work with

By aligning the DNA sequences of individuals mutations and variations can be observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Progress towards sequencing a whole genome: AFLP

A

Initially attempted by amplified fragment length polymorphism (AFLP)
- Combined restriction fragment polymorphism comparison with PCR
- Primers and adaptors (with primer sites)
- DNA was cut with that particular enzyme and then they would sequence it using these primers
- They would sequence many different fragments created by the restriction of the DNA

AFLP has now been replaced by automated sequencers that uses dyes to identify different bp

Although differences between individuals are observable you cannot see each gene distinctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Progress towards sequencing a whole genome: DNA Microarrays

A

DNA Microarrays

Genomic tool to monitor thousands of loci at once

Microscopic spots of single-stranded DNA, attached to a solid surface called a ‘chip’

Microarray assays are based on hybridization of a single-stranded DNA labeled with a fluorescent tag to the complementary molecule attached to the chip

When each spot in a micro array is attached to a unique DNA molecule, this can be used to detect presence/absence or even concentration of a particular type of DNA molecule

From the pattern of bound DNA you could observe variation across the whole genome

BUT STILL COULD NOT SHOW A COMPLETE GENOME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Progress towards sequencing a whole genome:
Massively parallel pyrosequencing

A

Massively parallel pyrosequencing

Massively parallel - very many fragments can be sequenced at the same time

Pyrosequencing - a chemiluminescent reaction marks the polymerase addition of specific nucleotides – was used by the ‘454’

Next Generation Sequencing
In pyrosequencing, DNA fragments were bound to a substratum, and the incorporation of nucleotides by the polymerase is marked by the emission of light (generated by the Luciferase). Technology discontinued now.

See YouTube Video: http://www.youtube.com/watch?v=nFfgWGFeOaA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Progress towards sequencing a whole genome:
Illumina (sequencing by synthesis)

A

– modified massive parallel pyrosequencing
– DNA is broken up usually by sound (sonification) into small fragments and using ligase anneal a sequence onto them
- provides short reads (125-250bp),but a single run can now produce a terabase of data.

Illumina Sequencing by Synthesis:

Step 1: Build a library by fragmenting DNA, ligating on ‘adapters’ and carefully checking the quality and concentration

Step 2: Anneal fragments onto the Illumina ‘flow cell’. This has matching adaptor sequences already attached.

Step 3: Amplify individual fragments into clusters of copies of the same fragment.

Step 4: Sequence using fluorescently labelled nucleotides on all copies all at the same time (‘massively parallel’).

Step 5: Separate out individual sample sequences
bioinformatically using the known sequences (‘barcodes’) built into the adapters.

From this, short sequence reads (short segments) are assembled where they overlap – it is not very precise but can be used to create continuous segments of DNA of 20-30000 bp on the computer.

bioinformatics workshop:
http://www.slideshare.net/LutzFr/bioinformatics-workshop-sept-2014

useful 4 minute summary:
https://www.youtube.com/watch?v=HMyCqWhwB8E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Progress towards sequencing a whole genome:
Restriction site associate DNA sequencing (RADseq)

A

Restriction site associate DNA sequencing (RADseq)

DNA is cut up and library is built from the sections that are cut up

but only utilises fragments that have different restriction sites at each end

So it’s a subset of fragments of the genome that make up the sequence

Upto 20,000 loci can be sequenced in this way

Relatively fast and simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Long reads

A

Linked read method -DNA is hybridised onto beads and collected and short reads associated by cluster – up to 100,000 bp this way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PacBio (single molecule real-time)

A

PacBio (single molecule real-time)

provides long reads (up to about 10Kb or so), but relatively low accuracy per read.

PacBio uses SMRT (single molecule real-time)sequencing to generate long reads.

Sequencing takes place on immobilised polymerase in a ‘waveguide’ (right), incorporating fluorescently labelled nucleotides which generate detectable light of different wavelengths as nucleotides are incorporated.

See: http://www.clpmag.com/2017/04/smrt-sequencing/ :

see notes for Figure: A step-wise illustration of single-molecule, real-time (SMRT) sequencing: (1) fluorescent phospholinked labeled nucleotides are introduced into the zero-mode waveguide; (2) the base being incorporated is held in the detection volume for tens of milliseconds, producing a bright flash of light; (3) the phosphate chain is cleaved, releasing the attached dye molecule; (4–5) the process repeats.

17
Q

Oxford nanopore sequencing (by ‘threading’ very long reads through nanopores)

A

Each time a bp passes through the nanopore it generates an electric pulse and the amplitude identifies the bases

Used in hospitals for diagnostics

Initially quite inaccurate now with new tech down to ~1% error (update from diagram below – similar now to illumina)

18
Q

DNA sequencing methods in order of accuracy

A

Sanger sequencing— single run has an error rate as low as 0.1%

Illumina sequencing - NI-2% error for a single read — gains
accuracy through replication

PacBio sequencing - N 11% error for a single read — gains
accuracy through replication

Early Oxford Nanopore sequencing - N 10-15% error for a
single read — accuracy greatly improved very recently with
new chemistry and programming.

19
Q

In 2001 a human genome was sequenced

A

Genome sequencing projects are expanding rapidly in
number, including a very broad range of species

see:
https://www.ncbi.nlm.nih.gov/genome

See: https://www.earthbiogenome.org/www.pnas.org/cgi/doi/10.1073/pnas.1720115115

20
Q

Different programmes used to identify genomes that are used by Durham’s supercomputer ‘Hamilton’ :

A

Samtools
STACKS
GATK
Durham HPC ‘Hamilton’
Linux
Plink
Bamtools
Bcftools
VCFtools
Bowtie2

21
Q

Summary

A

1) Gel electrophoresis of proteins – this was the first method to permit large-scale screening of natural populations – led to the unexpected discovery that there was substantial variation in natural populations at the molecular level. (1960’s – present).

2) DNA extraction and electrophoresis – permitted the analysis of DNA directly for the first time, through the use of restriction enzymes and DNA probes. (1970’s –present).

3) DNA sequencing –The chain termination method is the most direct and efficient method and is the standard today, but sequencing by other methods began in the early1970’s. This became much easier with PCR.

4) PCR – this changed everything, because it facilitated the analysis of specific genes without the need for cloning. This method allowed the analysis of microsatellite DNA markers, which became a standard in population genetic studies. (1985 – present).

5) AFLP, microarrays, next generation sequencing – Methods to access information across the genome have been developed, and this is the future – the challenge is to work with all those data (bioinformatics).

22
Q

Implications for the study of evolution

A

1) Gel electrophoresis of proteins reflected diversity at coding genes, and provided the first indication of how much natural variation there is – the raw material for evolution by natural selection.

2) DNA extraction and electrophoresis allowed higher resolution – no longer limited to changes reflected in the electrophoretic properties of proteins, but limited to variation revealed at restriction enzyme cutting sites.

3) DNA sequencing revealed how much variation there really is – at ‘third position synonymous’ sites within coding regions, in non-coding regions, in the genomes themselves, and also permitted the investigation of how genes function.

4) PCR made all aspects of the work easier, because specific regions of the DNA could be targeted, amplified, and analysed by DNA sequencing or other means.

5) AFLP, microarrays, and especially next generation sequencing have permitted variation across the genome to be explored, and especially facilitated the investigation of evolution at both ‘neutral’ and functional components of the genome.

23
Q

The future

A

The future:
Telomere to Telomere sequencing of complete genomes

Excellent coverage of long read sequence data.

Also, methods like ‘Hi-C’ which captures and links chromosomes prior to sequencing, and thereby assesses chromosome structure.

(see diagrams in notes)

Pangenomes and understanding the importance of structural variation in genomes.