Bacterial Genomics Flashcards
Microbial genomes
small
high density with small intergein regions with little or no intros with very small amount of repeat or non coding dna
short protein coding genes
operons with promoters just upstream
Bacterial Genome organisation
Chromosomes
Generally a single circular chromosome (always DNA)
Some species have linear chromosome(s) - Borrelia, Streptomyces
Some species have two chromosomes - Vibrio cholerae
Both circular and linear - Agrobacterium tumefaciens, Borrelia burgdoferi
Plasmids
Independent autonomous replicon - circular or linear
may integrate into chromosome
copy number varies 1 to 10s
often carry non-essential genes that confer an adaptive advantage in certain conditions
sanger method history
Sanger chain-termination method, developed in the late 1970s, second Nobel Prize for Fred Sanger.
Technical improvements in the 1980s led to increased sequence read length from 400 to over 800 bp
Principal limitation of Sanger sequencing is requirement for cloning step in library construction with non-clonable regions not present in library
how the sanger method is done
A DNA primer is radiolabeled
The primer is annealed to the template DNA
The primer is extended by DNA polymerase
Incorporation of a deoxynucleotide - further extension possible
Incorporation of a dideoxynucleotide – chain termination
Four reactions set up ddATP, dATP, dCTP, dGTP, dTTP ddCTP, dATP, dCTP, dGTP, dTTP ddGTP, dATP, dCTP, dGTP, dTTP ddTTP, dATP, dCTP, dGTP, dTTP
then results shown on autoradiograph (read from bottom)
The development of Automated singer sequencing
Replaced radioisotopes with fluorescent dyes
- Safer
- Each of the four DNA bases assigned a different colour
- All four reactions could be run on a single lane rather than four as previously
- The migration of the dye could be read because of the fluorescence
- This information allowed automatic gel reading
Further improvements were made
- Improved dye chemistry using fluorescent dideoxy-terminators
- Replacing slab gels with capillaries
whole genome shotgun sanger sequencing
- random shearing of genome (cutting)
- size selection and put in plasmid vector
- clone and pick colonies to make shotgun library
- sequence each insert with two primers
Era of high through put sequencing
100x faster, 100x cheaper! Different technologies available - 454 (Roche) - Illumina - Ion Torrent - PacBio
Fundamentally different from Sanger sequencing
- Solid-phase amplification of clonal templates
- New chemistries for sequence reading
- > 454: pyrophosphate detection on base addition
- > Illumina: reversible de-protection of fluorescent bases
High throughput shotgun sequencing
- shear randomly
- size exclusion
- add adapters
- amplify
- sequence
stages in bacterial genome sequencing project
Sequencing
- Shotgun sequencing of randomly generated fragments – restriction enzymes or more commonly physical fragmentation
- many fold coverage to ensure completeness and few ambiguities – very quick with modern approaches
Assembly
- Ordering of sequences through identifying overlaps, computationally intensive, may yield complete continuous genome sequence, often “gaps” that need filling manually – time intensive
Annotation
- From sequence data to gene list, automated or manual – very time intensive
- Allows display in more visually appealing way
Genome annotation
Annotation is the addition of information about the predicted sequence features to the flat file of DNA code
Identification of potential coding sequences - CDS
Homology searches to predict function
Other features can be annotated as well (rRNAs/tRNAs, Promoters, Small non-coding RNAs, Repeat sequences, Insertion sequences (ISs), transposons, gene fragments)
Location of the origin of replication
Determination of the number of bases, genes and G+C%
what have we found out about bacterial genomes?
- Variety of genome size
- Some genomes are shrinking
- A genome is not representative of a species
- Genome sequence data as the ultimate epidemiological tool
Massive gene decay in the leprosy bacillus
- has pseudogenes (27% inactive reading frame)
- 50% protein coding
- remaining 23% noncoding (mutated beyond recognition and regulatory sequences)
Beyond to genome to the pan genome
“one bacterial species - one genome sequence” is no longer the paradigm
Bacterial genomes comprise:
- core sequences — genes that encode proteins involved in essential functions, such as replication, transcription and translation
- dispensable sequences that encode proteins which facilitate organismal adaptation
Dispensable sequences are characterized by:
a variable pattern of presence or absence in different bacterial isolates.
high rates of nucleotide sequence variability
association with pathogenicity islands - virulence and resistance to antimicrobials
whats a genome
The entire hereditary information of an organism that is encoded by its DNA (or RNA for some viruses).
whats the pan genome
The global gene repertoire of a bacterial species that comprises the sum of the core and the dispensable genome (from the Greek ‘pan’, meaning whole).