Lecture 22 - Genomics and Bacterial Evolution Flashcards
Functional genetics
Work out how a protein works from the genetic code, and experimental data.
Study of a single genome
Genomic analysis
Study of several genomes
Comparative genomics
What did Fred Sanger initially sequence?
PhiX bacteriophage
Size of PhiX
~5Kb
Sanger sequencing method 1) 2) 3) 4)
1) Break sequence of interest into fragments
2) Place in test tube with dideoxynucleotides, each with an individual dye. ddnucleotides terminate chain elongation
3) Run fragments on a polyacrylomide gel, which can resolve to individual base pair level
4) The dye colour of nucleotides is read
Read size of sanger sequencing
~600bp
Read
The length of a single piece of DNA that can be sequenced by a particular method
Read assembly
Reads are placed together, according to consensus sequences.
This forms a contig, which is a sequence of reads
Contig
Where read sequences overlap, make a sequence of consensus sequences
Gap
When a computer can’t find a match in reads to make a contig
Why can gaps occur?
1)
2)
1) DNA polymerase can’t extend sequence for some reason
2) If there is a repeated region, and the read size is smaller than the size of the repeat.
Automated Sanger sequencing method
Capillary electrophoresis
Illumina sequencing 1) 2) 3) 4) 5) 6) 7) 8) 9)
1) Break DNA of interest into fragments
2) Adaptors of known sequence are added, ligate to the ends of dsDNA
3) A glass slide is prepared, with sequences complementary to primers adhered to surface
4) Hybridisation of primers, adhered complementary sequences
5) Add unlabelled nucleotides, DNA polymerase. Bridge amplificaiton
6) DNA synthesis, bridges become double stranded
7) Denaturation, to ssDNA
8) PCR to make high-density DNA clusters
9) Bases tagged with fluorescent dyes added. When a base is added, emits fluorescence which is detected.
Key difference between Sanger and Illumina
Illumina sequencing can continue on same strand after dye-tagged base is added.
Fluorescent part is cleaved off when base is incorporated, so it doesn’t interfere with further elongation
MiSeq output per run
15Gb
NextSeq500 output per run
120Gb
HiSeq2500 output per run
1000Gb
MiSeq read number
25 million
NextSeq500 read number
400 million
HiSeq2500 read number
4000 million
MiSeq read length
2x300bp
NextSeq500 read length
2x150bp
HiSeq2500 read length
2x125bp
MiSeq time for run
~4 hours
Most inexpensive sequencing method
Illumina
PacificBio RS output per run
375Mb
PacificBio RS read number
~45,000
PacificBio RS read length
Over 20Kb
What is PacificBio RS? 1) 2) 3) 4) 5)
1) Single molecule, real time sequencing
2) DNA synthesis by immobilised DNA polymerase
3) Phospholinked nucleotides release light when incorporated
4) No amplification
5) Under 180 minutes per run
PacificBio RS method 1) 2) 3) 4) 5)
1) Don’t fragment DNA of interest too much (reduces read length)
2) Repair ends
3) Adaptor ligation to DNA ends
4) DNA is polymerised by DNA polymerase fixed in a 0-mode waveguide well
5) When a phosphonucleotide is incorporated, light is emitted and detected. Each base has a different dye, and emits a different wavelength of light
Size of wells used in PacificBio RS
Zeptolitre quantities
Why is it better to not have an amplification stage in sequencing?
Not all DNA is amplified at equal levels.
This can affect results
What are long read lengths useful for?
For complex sequences of DNA, such as repeat regions.
Sanger output per run
9600bp
Read number of sanger
96
Sanger run time
3 hours
PacificBio RS run time
30 minutes - 3 hours
Sanger cost per Mb
$2400
Illumina cost per Mb
$0.15
PacificBio RS cost per Mb
$1
Genome annotation
A process which locates genes in a genome map
How to annotate a genome
1)
2)
1) Identify open reading frames
2) Experimentally identify gene function, or compare to other genes
Open reading frame
Over 100 codons that are uninterrupted by a stop codon.
See if there is an obvious ribosomal binding site at the 5’ end, terminator sequence at 3’ end
Bioinformatics
1)
2)
3)
1) Analysis of a genome using computers
2) Generates information of genome structure, content, arrangement
3) Uses annotation to determine location of genes on newly-sequenced genome
Significance of an open reading frame
Presumed to encode a protein
BLAST
Basic local alignment search tool
A tool used in bioinformatics
BLAST
BLAST role
Compares primary sequence information from different genomes
Type of sequencing methods that Illumina and PacificBio RS are
Sequencing by synthesis