Bacterial Genomics Flashcards by Matthew Clothier

How do you generate a chromosomal library?

Extract chromosomal DNA from strain
Fragment DNA by digestion (each fragment can contain single gene, multiple genes, or part of gene)
Each fragment cloned into vector(plasmid) [pSUM36 in example]
Each plasmid transformed into bacteria and plated to grow into different colonies on medium (generally agar).
Library = collection of colonies which each represent different pieces of genome of that strain

How well did you know this?

Not at all

Perfectly

How was a chromosomal library used in the Mycobacterium fortuitum study?

Gene that has mutation for resistance will transfer resistance to its vector
Plasmid will transfer resistance to the colony that is formed after transformation
Library could then be plated on strep+ selective media with lethal conc. of strep
o Only resistant clones (with resistant gene) would be able to grow
Select those colonies that grow and isolate the plasmids from their cells. [pAC5, pAC6]
o At this point you have isolated a few fragments of DNA that contains a gene resistant to Strep; but there might be multiple genes on it so you need to determine which is responsible
Sequence bases from each fragment (Sanger Sequencing) and find overlapping sequences ( = the position of the resistant gene)
Found 2.5kb region with 3 viable (Full) genes [orfB, orfC, orfD]
Researchers cut out 1kb fragment (orfC) and cloned into plasmid pSAN26
Original 2.5kb fragment (all 3 genes) cloned into plasmid pSAN19
Transformed both in bacteria on strep+ medium and measured conc of STR at which bacterial growth was stopped.
Both colonies survived at the same conc. showing that orfC was the gene responsible (When just it was present the same result was seen as when all 3 were present)

How well did you know this?

Not at all

Perfectly

Steps in Sanger Sequencing

DNA is digested and cloned into plasmids
Extracted plasmid DNA is incubated with mixture of DNA Primers, Free bases, DNA Polymerase and ‘Terminator Bases’ (di-deoxynucleotides which are fluorescently labelled).
a. Each di-deoxynucleotide base is labelled with a different color (e.g. A = Green etc.)
Mixture heated to 96 °C – causing DNA to unravel
Cooled to 50 °C – allows DNA primer to bind to plasmid DNA @ start of insert DNA
Temp increased to 60 °C – DNA Polymerase binds to primer and adds bases until terminator base is added.
Everything then reheated to 96 °C to separate new strand from original strand
Rinse and Repeat – Forms fragments of every size ending with terminal (fluorescently labelled) bases.
Fragments separated by size (length) via electrophoresis
a. Capillary tube lowered into each well of plate and charge is applied which draws DNA through the porous gel in the tube (smaller go faster)
b. At end of capillary is laser which causes terminator bases to light up
c. Color of bases is detected by a camera and recorded.
Because (assuming enough digestion occurred) fragments of every size exists, order of terminal bases to reach end of capillary tube should be the same as the order of bases from the DNA primer [i.e. the sequence of the insert DNA].

How well did you know this?

Not at all

Perfectly

Biochemistry of Sanger Sequencing

Relies on presence of chain terminating di-deoxynucleotides which interrupt DNA synthesis by blocking formation of phosphodiester linkages between incoming bases and the new strand.
o Replaces 3’ hydroxyl group of nucleotide with H (di-deoxy…).
o Hydroxyl group required for phosphodiester linkage, so synthesis is stopped.

How well did you know this?

Not at all

Perfectly

What are the limitations of Sanger Sequencing?

Can only sequence between 300-1000 bp:
o Quality isn’t good in first 15-40bp (Primer binding)
o Sequence quality degrades after 700-900bp.
Relatively expensive.
Labor-intensive
Bias against genes toxic to host
o [because of large insert size, full length genes could be included which would be expressed and kill host]

How well did you know this?

Not at all

Perfectly

What are the steps in Illumina Sequencing? (Just name)

Library Preparation
Cluster Generation
Sequencing
Data Analysis

How well did you know this?

Not at all

Perfectly

Illumina Sample Prep/Library preparation

a. Genome fragmented
i. Done because the instruments can only deal with shorter fragments

b. Adaptors/linkers attached to ends of DNA fragments (2 different oligonucleotide adaptors, one on each end)
i. Are just small fragments of DNA with a known sequence
ii. Added so we can manipulate the fragments of DNA directly (attaching to flow cell, annealing of primers etc.)

c. Reduced cycle amplification introduces additional motifs (e.g. sequencing binding site

How well did you know this?

Not at all

Perfectly

Illumina Cluster Generation

a. Fragments added to flow cell (glass slide with lanes). Each lane is a channel coated w 2 types of oligonucleotides (complimentary to the adaptors)
i. Called flow cell because different reagents are flowed over the surface during the reaction

b. Hybridization occurs by complimentary binding between an oligo- on the surface and the complimentary adaptor on the fragment.
i. Primer on flow cell is designed to be complimentary to the adaptors

c. Bridge Amplification (Clonal amplification of fragment):
i. Strand bends over and hybridizes to second type of oligo- in flow cell (complimentary to second adaptor on fragment)
ii. Polymerases generate complimentary strand -> dsDNA bridge
iii. Bridge is denatured -> 2 single stranded copies of DNA mol that are tethered to 2 different primers on the flow cell (In opposite orientations)

d. Reverse strands are cleaved and washed away -> leaving only the forward facing strands
i. Through different boiling/denaturing temps etc. for different sequences

e. Results in many copies of the same piece of DNA appearing in the same flow cell – appears as one much stronger signal (if fluoresced)

How well did you know this?

Not at all

Perfectly

Illumina Sequencing Step

a. Primer binds to oligonucleotide (adaptor) at sequencing binding site (Primer made to be complimentary to adaptor)

b. Sequencing occurs with fluorescently labelled bases -> forms complimentary fluorescent strand
i. Bases also modified so only one base can be added at a time. One fluorescent base will be added to each fragment in the same flow cell (should be the same base) and then that color is read; then repeated.
ii. Then there’s a chemical reaction step that modifies the bases to allow the extension of the chain for the next base to be added. {REVERSIBLE TERMINATION}
iii. i.e. after each base is added the reaction is terminated while this occurs over each base in the cell and its read; then fluorescent signal cancelled and termination reversed, next base is added.

c. Strand excited with laser, and color sequence is captured.

d. Called ‘Sequencing by synthesis’
e. Read product (fluorescent strand) is washed away

How well did you know this?

Not at all

Perfectly

Illumina Data Analysis Step

a. Step 1: Genome assembly
b. Step 2: Alignment of reads to reference genome
c. De novo assembly: Assembling reads with no reference using only the overlapping sequences.
i. If you don’t have the reference genome to align fragments to

How well did you know this?

Not at all

Perfectly

Illumina error rate and coverage

i. Higher error rate compared to Sanger Sequence
ii. Depends on coverage [the overlap between repeated regions in different reads]
iii. The higher the coverage in a particular region the more confident you can be in the assignment that is given.

e. Doesn’t deal well with repeat regions:
i. Fragments are quite small -> if sequence is repeated in multiple places in the genome you cant tell because both fragments would map to the same place

How well did you know this?

Not at all

Perfectly

What is genome annotation?

‘Determining the structural and functional properties of the genome’

Structural – genes, promoters, pseudogenes, untranslated regions etc.
Functional – What role do the structural features play?
Additional elements – Origin of replication, mobile elements, pathogenicity islands etc.

How well did you know this?

Not at all

Perfectly

What methods of genome annotation are there?

Manual curation
a. Most accurate but very slow
b. Person using knowledge they have to curate a genome
Automated computational pipelines
a. Large amount of data available
b. Relies on the accurate functional annotation of genomes in databases or could result in the propagation of errors

How well did you know this?

Not at all

Perfectly

Steps in Genome Annotation

Structural annotation:
o Using info from outside the genome, from other organisms that are related and from the properties of the genome itself to determine the structural features.
Functional Annotation:
o Underlying assumption: similar/conserved sequences share the same function because they are related by ancestry
o Homologue – 2 genes/proteins share the same ancestry
• Identified by looking for similar/conserved sequences
• Orthologue – homologues occurring in different species
• Paralogue – homologues arising from duplication event

How well did you know this?

Not at all

Perfectly

What is global alignment?

Method of pairwise sequence alignment

Matching as many positions as possible over the entire length
Comparing annotated sequence from database to the query sequence (“new”)
Looks for similar sequences of equal length
Compares every sequence in 1 sequence to the same position in the query sequence and finds matching sequences.
Not always useful because query sequence might not be the same length as the annotated sequence, so then can’t be used.

How well did you know this?

Not at all

Perfectly

What is local alignment?

Study These Flashcards

Method of pairwise sequence alignment

Focus on the best matching regions of the sequence
Regions of similarity, need not be the same length
Match region to region rather than end to end
Useful for looking for similar genes in different organisms
o Genomes may have very little similarity but a specific region may still be highly similar and related.

What is BLAST?

Study These Flashcards

Basic Local Alignment Search Tool
Common tool used for genome annotation
Compares a ‘Query sequence’ against a database and finds similarity
Web based interface
Output is numerical value – measure of how similar your query sequence/region of query sequence is to a sequence in the database
Not only used for genome annotation – often used to identify regions of similarity in query sequences.

Shuttle Plasmids Genomic Information

Study These Flashcards

1 of the 2 ways to transform mycobacterial cells with DNA (CaCL2 doesn’t work)
o Other one is using an electric pulse
Plasmids that have been modified by incorporating elements from mycobacterial viruses (specifically infect mycobacteria)
Portion of mycobacteriophage genome required for replication and packaging is cloned into E. coli vector with E. coli ORI
o Allows phagemid to be manipulated in E. coli
o In E. coli viral particles cannot be made – allows researcher to make lots of phagemids with this viral DNA
Phagemid then introduced into a mycobacterial host, phagemid can then produce viral particles which contains the phagemid DNA.

Conditionally Replicating Mycobacteriophages Genomic Information

Study These Flashcards

DNA modified so we can stall replication at non-permissive temps.
Phagemids incubated with Mycobacterial strain at different temps:
o 30 degC allows replication of phagemid, viral particles produced,
o Bacteria lyses and viral particles are released
Cultured at 42 degC
o Prevents replication of the phagemid

Phage Transposon Mutagenesis

Study These Flashcards

Transposon inserted into the phagemid
Himar1: eukaryotic transposon – most used in these experiments
o Has transposase and inverted repeats
o When transposase expressed, produces enzyme which binds to inverted repeats and introduces ds breaks
o Following excision, transposon can then be inserted at recognition site (TA bases for this transposon).
Process:
o Phagemid with transposon introduced into mycobacterial host (M. smegmatis) and cultured at 42 degC -> no phagemid replication, no viral particles replicated, Because cell isn’t lysed we can see the effect of transposon insertion
o Expression of transposase causes transposon to jump from phagemid into chromosome
• Transposon can jump into every possible TA site within chromosomal genome – causing disruption of whichever gene it jumps into
o Results in each mycobacterial colony on the slide representing a different mutant (different gene is interrupted by transposon insertion)
o Forms a transposon mutant library
• Selected by an antibiotic that is present on the transposon.

What modifications do Himr1 have that make it useful for transposon mutagenesis?

Study These Flashcards

Kanamycin resistant gene incorporated for selection

2. T7 promoters incorporated to allow for mapping of insertion sites

Process of insertion site mapping for a single transposon mutant using Himr1

Study These Flashcards

In Himr1 transposons there are known internal sequences on flanks of transposon
Chromosomal DNA digested via restriction enzymes
Adapters ligated to ends of fragments
o Allows addition of a specific sequence
PCR amplification performed using internal sequence in transposon and sequence of adapter
o Allows specific amplification of DNA directly flanking the transposon (BLUE, transposon also amplified in fragment).
Generates ‘reads’
Height of each line represents the number of reads at that TA (Himr1 insertion site) site in the genome
o i.e. how many different mutants there are that contain an insertion at each TA site in the library
Areas with little/no insertions predict essential genes under the selection conditions
o Insertion into those genes caused mutant to have nonfunctional gene and die under selection conditions, so its absent in the mutant library
Could also be used to identify genes that resulted in growth defects or advantage

What are Non-Tuberculosis Mycobacteria (NTM)?

Study These Flashcards

o Those species that don’t cause TB or leprosy

o Opportunistic pathogens – cause infections of lungs, soft tissue and bones

What is NTM Pulmonary Disease?

Study These Flashcards

o Occurs mainly in people that are immune compromised or have underlying lung conditions
o Organisms commonly associated with NTM-PD = M. avium & M. abscessus.

M. abscessus

o Rapid growth – colonies in <7 days o Evidence of transmission between patients o Traditional molecular epidemiological tools: • Pulse-field get electrophoresis. (PFGE) • Multi-locus sequence typing (PCR and sequencing of selected genes) o NGS gives higher resolution because it looks at changes over the entire genome

What is Pulse-field Gel Electrophoresis (PFGE)?

- Genomic DNA extracted from strain - Digested with restriction endonuclease (Usually one that cuts infrequently) - Fragments are too large to be separated by traditional electrophoresis - Applies electric current at an angle to the sample, and applies it in pulses o Causes DNA to migrate in zigzag manner – increases path that DNA needs to migrate through - Relies on bp changes occurring at restriction endonuclease cut sites o All strain typing (differentiating strains from each other) relies on accumulation of random mutations within genome over time o This method requires those changes to occur at specific endonuclease cut site - This means that banding patterns can be used to make phylogenetic trees

Study: Transmission of M. abscessus - Results

1. The different subspecies separate on the tree (obvs) 2. Strains from a single individual often clustered together a. Indicates that these individuals most likely acquired strains from their environment and that they were quite genetically diverse 3. BUT there is also grouping of strains from different individuals 4. Antibiotic Resistance: a. 2 mechanisms of Azithromycin resistance: i. Inducible mechanisms – elevated expression of Erm gene (deleted in massiliense subspecies) in response to Azithromycin results in resistance ii. Change in bp from A at position 2058 in 23S rRNA gene to a C b. Found 3 individuals with Azithromycin resistance who had never been exposed to the antibiotic c. Amikacin resistance: G/C bp change in 1408/1409 position of 16S rRNA 5. Plotted timelines for each individual of where they could have contracted the infection – hospital visits or admissions etc. 6. Could compare those timelines to see when strains could have been transmitted between patients. a. Found 4 patients with no opportunities for transmission, yet their strains had very high similarity b. Decided their similarity was not due to transmission but to a dominating circulating strain.

Study: Transmission of M. abscessus - Conclusion

- 3 modes of infection by cystic fibrosis patients: o Independent acquisition of genetically diverse strains o Independent acquisition of dominant strain (related) o Transmission - Mechanism of transmission is unknown

Bacterial Genomics Flashcards

(28 cards)