Bacterial Genomics Flashcards
How do you generate a chromosomal library?
- Extract chromosomal DNA from strain
- Fragment DNA by digestion (each fragment can contain single gene, multiple genes, or part of gene)
- Each fragment cloned into vector(plasmid) [pSUM36 in example]
- Each plasmid transformed into bacteria and plated to grow into different colonies on medium (generally agar).
- Library = collection of colonies which each represent different pieces of genome of that strain
How was a chromosomal library used in the Mycobacterium fortuitum study?
- Gene that has mutation for resistance will transfer resistance to its vector
- Plasmid will transfer resistance to the colony that is formed after transformation
- Library could then be plated on strep+ selective media with lethal conc. of strep
o Only resistant clones (with resistant gene) would be able to grow - Select those colonies that grow and isolate the plasmids from their cells. [pAC5, pAC6]
o At this point you have isolated a few fragments of DNA that contains a gene resistant to Strep; but there might be multiple genes on it so you need to determine which is responsible - Sequence bases from each fragment (Sanger Sequencing) and find overlapping sequences ( = the position of the resistant gene)
- Found 2.5kb region with 3 viable (Full) genes [orfB, orfC, orfD]
- Researchers cut out 1kb fragment (orfC) and cloned into plasmid pSAN26
- Original 2.5kb fragment (all 3 genes) cloned into plasmid pSAN19
- Transformed both in bacteria on strep+ medium and measured conc of STR at which bacterial growth was stopped.
- Both colonies survived at the same conc. showing that orfC was the gene responsible (When just it was present the same result was seen as when all 3 were present)
Steps in Sanger Sequencing
- DNA is digested and cloned into plasmids
- Extracted plasmid DNA is incubated with mixture of DNA Primers, Free bases, DNA Polymerase and ‘Terminator Bases’ (di-deoxynucleotides which are fluorescently labelled).
a. Each di-deoxynucleotide base is labelled with a different color (e.g. A = Green etc.) - Mixture heated to 96 °C – causing DNA to unravel
- Cooled to 50 °C – allows DNA primer to bind to plasmid DNA @ start of insert DNA
- Temp increased to 60 °C – DNA Polymerase binds to primer and adds bases until terminator base is added.
- Everything then reheated to 96 °C to separate new strand from original strand
- Rinse and Repeat – Forms fragments of every size ending with terminal (fluorescently labelled) bases.
- Fragments separated by size (length) via electrophoresis
a. Capillary tube lowered into each well of plate and charge is applied which draws DNA through the porous gel in the tube (smaller go faster)
b. At end of capillary is laser which causes terminator bases to light up
c. Color of bases is detected by a camera and recorded. - Because (assuming enough digestion occurred) fragments of every size exists, order of terminal bases to reach end of capillary tube should be the same as the order of bases from the DNA primer [i.e. the sequence of the insert DNA].
Biochemistry of Sanger Sequencing
Relies on presence of chain terminating di-deoxynucleotides which interrupt DNA synthesis by blocking formation of phosphodiester linkages between incoming bases and the new strand.
o Replaces 3’ hydroxyl group of nucleotide with H (di-deoxy…).
o Hydroxyl group required for phosphodiester linkage, so synthesis is stopped.
What are the limitations of Sanger Sequencing?
- Can only sequence between 300-1000 bp:
o Quality isn’t good in first 15-40bp (Primer binding)
o Sequence quality degrades after 700-900bp. - Relatively expensive.
- Labor-intensive
- Bias against genes toxic to host
o [because of large insert size, full length genes could be included which would be expressed and kill host]
What are the steps in Illumina Sequencing? (Just name)
- Library Preparation
- Cluster Generation
- Sequencing
- Data Analysis
Illumina Sample Prep/Library preparation
a. Genome fragmented
i. Done because the instruments can only deal with shorter fragments
b. Adaptors/linkers attached to ends of DNA fragments (2 different oligonucleotide adaptors, one on each end)
i. Are just small fragments of DNA with a known sequence
ii. Added so we can manipulate the fragments of DNA directly (attaching to flow cell, annealing of primers etc.)
c. Reduced cycle amplification introduces additional motifs (e.g. sequencing binding site
Illumina Cluster Generation
a. Fragments added to flow cell (glass slide with lanes). Each lane is a channel coated w 2 types of oligonucleotides (complimentary to the adaptors)
i. Called flow cell because different reagents are flowed over the surface during the reaction
b. Hybridization occurs by complimentary binding between an oligo- on the surface and the complimentary adaptor on the fragment.
i. Primer on flow cell is designed to be complimentary to the adaptors
c. Bridge Amplification (Clonal amplification of fragment):
i. Strand bends over and hybridizes to second type of oligo- in flow cell (complimentary to second adaptor on fragment)
ii. Polymerases generate complimentary strand -> dsDNA bridge
iii. Bridge is denatured -> 2 single stranded copies of DNA mol that are tethered to 2 different primers on the flow cell (In opposite orientations)
d. Reverse strands are cleaved and washed away -> leaving only the forward facing strands
i. Through different boiling/denaturing temps etc. for different sequences
e. Results in many copies of the same piece of DNA appearing in the same flow cell – appears as one much stronger signal (if fluoresced)
Illumina Sequencing Step
a. Primer binds to oligonucleotide (adaptor) at sequencing binding site (Primer made to be complimentary to adaptor)
b. Sequencing occurs with fluorescently labelled bases -> forms complimentary fluorescent strand
i. Bases also modified so only one base can be added at a time. One fluorescent base will be added to each fragment in the same flow cell (should be the same base) and then that color is read; then repeated.
ii. Then there’s a chemical reaction step that modifies the bases to allow the extension of the chain for the next base to be added. {REVERSIBLE TERMINATION}
iii. i.e. after each base is added the reaction is terminated while this occurs over each base in the cell and its read; then fluorescent signal cancelled and termination reversed, next base is added.
c. Strand excited with laser, and color sequence is captured.
d. Called ‘Sequencing by synthesis’
e. Read product (fluorescent strand) is washed away
Illumina Data Analysis Step
a. Step 1: Genome assembly
b. Step 2: Alignment of reads to reference genome
c. De novo assembly: Assembling reads with no reference using only the overlapping sequences.
i. If you don’t have the reference genome to align fragments to
Illumina error rate and coverage
i. Higher error rate compared to Sanger Sequence
ii. Depends on coverage [the overlap between repeated regions in different reads]
iii. The higher the coverage in a particular region the more confident you can be in the assignment that is given.
e. Doesn’t deal well with repeat regions:
i. Fragments are quite small -> if sequence is repeated in multiple places in the genome you cant tell because both fragments would map to the same place
What is genome annotation?
‘Determining the structural and functional properties of the genome’
Structural – genes, promoters, pseudogenes, untranslated regions etc.
Functional – What role do the structural features play?
Additional elements – Origin of replication, mobile elements, pathogenicity islands etc.
What methods of genome annotation are there?
- Manual curation
a. Most accurate but very slow
b. Person using knowledge they have to curate a genome - Automated computational pipelines
a. Large amount of data available
b. Relies on the accurate functional annotation of genomes in databases or could result in the propagation of errors
Steps in Genome Annotation
- Structural annotation:
o Using info from outside the genome, from other organisms that are related and from the properties of the genome itself to determine the structural features. - Functional Annotation:
o Underlying assumption: similar/conserved sequences share the same function because they are related by ancestry
o Homologue – 2 genes/proteins share the same ancestry
• Identified by looking for similar/conserved sequences
• Orthologue – homologues occurring in different species
• Paralogue – homologues arising from duplication event
What is global alignment?
Method of pairwise sequence alignment
- Matching as many positions as possible over the entire length
- Comparing annotated sequence from database to the query sequence (“new”)
- Looks for similar sequences of equal length
- Compares every sequence in 1 sequence to the same position in the query sequence and finds matching sequences.
- Not always useful because query sequence might not be the same length as the annotated sequence, so then can’t be used.