Bacterial Genomes Flashcards
bacterial strain basic
change one nucleotide in a bacterium
becomes a new strain
strains differ by at least 1 nucleotide
Replicore
direction of replication
E. coli K12 strain
GI tract commensal
has V few gaps in genome - most places contain an ORF
intergenic regions small - 1.5kb-ish
most of genome is Protein/RNA coding
largest gene product is ribosomes
E coli common intergenic regions
promoters
regulatory regions
properties of K12’s most transcribed regions properties:
most are quite close to OriC
anything close to the OriC will be duplicated earlier in cell cycle - meaning that there are more copies of it in the cell for longer than other regions
eg ribosomes
allows for production of more ribosomes
the direction of the ORFs are oriented in the same direciton as the direction of replication
because the promoters will be occupied with RNA polymerase at all times
and if the RNA pol is going the opposite direction as the DNA pol/replication fork there and replication fork hit it - can disrupt it - causing issues with replication - can lead to chromosome defects
if DNA and RNA pol are going in the same direction then there is a lot less chance of collisions (like trains on a track)
terminus regions
where bi directional replication forks from the replisome meet
Rep sequences
repetetive extragenic palindromic sequences
found in intergenic regions at ends of genes
highly conserved
strictured 35-40nt elements
are targets for TEs
transcibed together with upstream genes
their stable stem loop structures protect mRNA against exoribonuclease digestion
alter stability of mRNA
also are able to otherwise moduate translation in response to environmental stress
genes with these have link to stress response?
codon adaptation index
comparing codons used in all genes in the genome to something that is highly expressed (conserved not expressed - mistake in notes?)
if something has abberant codon usage
can be from outside source - eg horizontally acquired DNA
cryptic prophages
genes in the genome with homology to phage genes
phage genomes that have been integrated into the bacterial one
ways of describing prokaryote genomes
compsition and regions that have aberrant composition
biological/biochemical inventory
evolutionary considerations
prokaryote genome composition descriptions
base composition (%C %G %A %T)
eg high G+C in gram positive bacteria
GC content eg (% of the genome sequence that is G or C)
GC skew or other compositional skews
underrepresentation of certain bases on a single strand
words (eg underrepresentation of a certain work eg CTAG)
repeated (short) sequences
eg REP
their abundance and possible function
GC skew
=(#G - #C) / (#G+#C)
fewer G than C => negative value
more G than C => positive value
DNA sequence being random would give equal proportions of G and C
so skew is a sign of certain properties of the sequence
see linear representation for E. coli genome
GC skew correlated with replication fork in many bacteria
leading strand tends to have a positive GC skew whilw lagging strand a negative
-there is more G than C in the leading/top strand
(leading strand is top strand in right replisome
other way around in left)
if there is a negative skew in a section of the top strand
can indicate differences
what drives GC skew
Cysteine deamination on the leading strand
look at the RIGHT replisome
E coli genome begins replication
has two forks bidirectional
DNA pol on both strands each way
-leading strand synthesis is continuous
-lagging strand fragmented
look at the template for each of these
template for leading strand was orignally synthesised as a lagging strand
vice versa for new lagging
-discontinuous replication means there is more ssDNA present
-so the template for the lagging strand will have more ssDNA
-and so more of the template will be unstable
-causing cytosine deamination in the lagging strand’s template
-turning it into Uracil
-which is recognised as a T by DNA pol
-meaning that there will be less G in the lagging strand
this is why the top strand and bottom strand skew values are negative on one side and positive on the other - due to leading and lagging being on diff sides
this mutation will be selected against if affecting coding sequences
but in tolerable places where C loss can be tolerated they will be lost - skewing GC content
recognisable NA sequence motifs and repetitive/repeat sequence
CTAG underrepresented word - occurs only 5% of expected amount - but found more frequently in intergenic regions
skewed words:
Chi seqience
8-mer GCAGGGCG
both overrepresented
Rhs elements
Rep sequences
why is CTAG underrepresented
sequence can cause bend in DNA during TRANSCRIPTION
so is selected against being in ORFs