Bacterial Genomes Flashcards
bacterial strain basic
change one nucleotide in a bacterium
becomes a new strain
strains differ by at least 1 nucleotide
Replicore
direction of replication
E. coli K12 strain
GI tract commensal
has V few gaps in genome - most places contain an ORF
intergenic regions small - 1.5kb-ish
most of genome is Protein/RNA coding
largest gene product is ribosomes
E coli common intergenic regions
promoters
regulatory regions
properties of K12’s most transcribed regions properties:
most are quite close to OriC
anything close to the OriC will be duplicated earlier in cell cycle - meaning that there are more copies of it in the cell for longer than other regions
eg ribosomes
allows for production of more ribosomes
the direction of the ORFs are oriented in the same direciton as the direction of replication
because the promoters will be occupied with RNA polymerase at all times
and if the RNA pol is going the opposite direction as the DNA pol/replication fork there and replication fork hit it - can disrupt it - causing issues with replication - can lead to chromosome defects
if DNA and RNA pol are going in the same direction then there is a lot less chance of collisions (like trains on a track)
terminus regions
where bi directional replication forks from the replisome meet
Rep sequences
repetetive extragenic palindromic sequences
found in intergenic regions at ends of genes
highly conserved
strictured 35-40nt elements
are targets for TEs
transcibed together with upstream genes
their stable stem loop structures protect mRNA against exoribonuclease digestion
alter stability of mRNA
also are able to otherwise moduate translation in response to environmental stress
genes with these have link to stress response?
codon adaptation index
comparing codons used in all genes in the genome to something that is highly expressed (conserved not expressed - mistake in notes?)
if something has abberant codon usage
can be from outside source - eg horizontally acquired DNA
cryptic prophages
genes in the genome with homology to phage genes
phage genomes that have been integrated into the bacterial one
ways of describing prokaryote genomes
compsition and regions that have aberrant composition
biological/biochemical inventory
evolutionary considerations
prokaryote genome composition descriptions
base composition (%C %G %A %T)
eg high G+C in gram positive bacteria
GC content eg (% of the genome sequence that is G or C)
GC skew or other compositional skews
underrepresentation of certain bases on a single strand
words (eg underrepresentation of a certain work eg CTAG)
repeated (short) sequences
eg REP
their abundance and possible function
GC skew
=(#G - #C) / (#G+#C)
fewer G than C => negative value
more G than C => positive value
DNA sequence being random would give equal proportions of G and C
so skew is a sign of certain properties of the sequence
see linear representation for E. coli genome
GC skew correlated with replication fork in many bacteria
leading strand tends to have a positive GC skew whilw lagging strand a negative
-there is more G than C in the leading/top strand
(leading strand is top strand in right replisome
other way around in left)
if there is a negative skew in a section of the top strand
can indicate differences
what drives GC skew
Cysteine deamination on the leading strand
look at the RIGHT replisome
E coli genome begins replication
has two forks bidirectional
DNA pol on both strands each way
-leading strand synthesis is continuous
-lagging strand fragmented
look at the template for each of these
template for leading strand was orignally synthesised as a lagging strand
vice versa for new lagging
-discontinuous replication means there is more ssDNA present
-so the template for the lagging strand will have more ssDNA
-and so more of the template will be unstable
-causing cytosine deamination in the lagging strand’s template
-turning it into Uracil
-which is recognised as a T by DNA pol
-meaning that there will be less G in the lagging strand
this is why the top strand and bottom strand skew values are negative on one side and positive on the other - due to leading and lagging being on diff sides
this mutation will be selected against if affecting coding sequences
but in tolerable places where C loss can be tolerated they will be lost - skewing GC content
recognisable NA sequence motifs and repetitive/repeat sequence
CTAG underrepresented word - occurs only 5% of expected amount - but found more frequently in intergenic regions
skewed words:
Chi seqience
8-mer GCAGGGCG
both overrepresented
Rhs elements
Rep sequences
why is CTAG underrepresented
sequence can cause bend in DNA during TRANSCRIPTION
so is selected against being in ORFs
Rhs elements
rearrangement hotspots
areas w multiple repeats - 6-10kb repeats
act as regions of (ectopic?) homologous recombination
-can cause inversions and deletions
contain central common ORF with downstream shorter more variable peptides - potential contact dependent competitive growth inhibitor for interbacterial competition
Chi sites as recombinational hotspots / RecBCD
dsBreak can be lethal
Chi site recognised by RecBCD
-B and D helicases
-D is slightly faster
-so causes DNA infront of B to bunch up in a ssDNA region in front of the complex
-3-5’ exonuclease activity
-leaves a 3’ free end at the Chi site where recombination can occur
and then future DNA synthesis
investigating unknown gene function
use biochemical or phenotypical assays
more and more elaborate ones to catch function o
orthologues v paralogues
-orthologues:
the descendants from a common ancestral gene before the divergence of two host organisms (ie homologous between organisms)
come from the species themselves diverging
due to ancestry between diff organisms
-paralogues:
Evolutionarily related (ie homologous) genes within the same organism
from a gene duplication event
duplication and new function within the same genome of the same organism
causes groups/families of genes
E hec E. coli (O157)
Found in terminal rectum of cows
can contaminate meat during processing
causes enterohaemorrhagic colitis
E hec (O157) E. coli compared to K12 E. coli
common backbone between them
O157 genome larger than K12
1387 genes that K12 does not have
75168 point mutations - according to molecular clock diverged 4.5mya
have diff sequences in each strain that were horizontally acquired
BUT in each genome they share common spot in the genome
indicating a common point of insertion for those diff phage DNA in the genomes
(O and K islands)
O157 contains lysogenci phage strains (idk if K12 does too probably cause of K islands)
eg LEE
LEE
locus of enterocyte effacement
produces toxins
important for virulence
present in O157 but not K12