Lecture 23 (Part 2) – Base Composition Evolution Flashcards
What do we mean by “base composition”?
The proportions of the four bases (A, C, G, and T/U) present in DNA or RNA.
How is base composition usually expressed?
Usually expressed as the percentage of bases that are G and C - the GC content.
What would a GC content that is greater than 50% indicate?
Bias or skew in a certain direction.
What is the typical range for GC content in vertebrates?
40-45%.
What is the approximate GC content in humans?
41% approx.
Where do we see the most variation in GC content?
In unicellular organisms.
- Single-celled fungi
- Bacteria
- Archaea
What are the two main hypotheses for variation in GC content?
- Mutationist - GC biases are just a reflection of mutation patterns.
- Selectionist - GC biases are an adaption to selective pressures.
- In other words, the GC content can have a fitness impact and thus should be shaped by selection
- “Optimal GC content”
What does the mutationist hypothesis argue?
That GC biases are just a reflection of mutational patterns.
1. Involves the propensity of different bases to spontaneously mutate.
- There is a universal bias towards AT pairs.
- This is related to the inherent chemical properties of the bases (e.g., cytosine prone to deamination which leads to GC to AT mutations)
- Also involves the fidelity and bias of DNA replication and repair machinery.
- E.g., biased DNA mismatch repair systems
- Can differ between species (e.g., GC biased gene conversions in diploids leads to an increase of GC alleles relative to AT)
What does the selectionist hypothesis argue?
GC biases are an adaption to selection pressures.
- GC-rich genomes are more thermodynamically stable (due to stacking interactions)
- UV radiation produces thymidine dimers, thus high amounts of T are more susceptible to damage, and GC-rich sequences are more protected
- Lifestyle influences may lead to selection on GC content (e.g., nitrogen-fixing and aerobic bacteria tend to have higher GC content)
What is a GC skew?
When the freq(G) is very different to the freq(C) on a single strand.
- GC skew = (G - C)/(G + C)
What is an AT skew?
When the freq(T) is very different to the freq(A) on a single strand.
- AT skew = (A - T)/(A + T)
Why do we observe an abrupt shift in the frequency of G and C at the origin of replication?
It is related to whether the strand is a leading strand or a lagging strand!
(See lecture 23 @16:30)
What bases do leading strands tend to be enriched for?
G and T.
What bases do lagging strands tend to be enriched for?
C and A.
How would the mutationist hypothesis explain the skew?
Asymmetrical mutation pressures for the leading and lagging strand.
- Difference in patterns of replication error
- E.g., Leading strand is more prone to cytosine deamination during replication because it is left as a single strand for longer (which would change AT to GC skews)
Evidence:
- The boundaries coincide with origin and terminus of replication (suggests a link with the replication process)
- Bias is stronger at third codon positions and in intergenic DNA (characteristic of mutation bias)