Gene duplication & Exon shuffling Flashcards
How can a genome acquire a new gene?
- Horizontal gene transfer
- Exon shuffling
- Duplication and divergence
o 1% chance for 1 gene to duplicate in 1 million years
Function of genes
- Promiscuous = Side reaction has no biological function
- Bifunctional = both activities have a biological function
- Over evolution, 2 functions diverge → enzymes pick up different mutations → specialise → become better at catalysing one reaction or what was originally a side reaction
How is DNA duplicated by recombination?
- Unequal crossing over (meiosis)
o Only requires certain lengths of similar sequences
o Can get recombination between sets of repeats that are inappropriately lined up
o One chromosome has duplication; other has deletion → have different daughter gametes → if have selective advantage will survive through evolution - Unequal sister chromatid exchange (mitosis)
o Involves exchange between two chromatids
o Paired up on repeat sequence → one chromatid duplication, one deletion
o Depending on species will not be passed on to progeny - DNA amplification during replication
o In haploid organisms (e.g. bacteria)
o Unequal recombination during replication → ‘replication bubble’: DNA splits up in replication forks → homologous DNA but inappropriate lining up so one strand has duplication of region, other gets deletion - Replication Slippage
o For short DNA sequences e.g. microsatellites, CAG triplet, poly-Q Huntington’s disease
o Not common for genes
o DNA loops out one repeat and starts to re-pair-up downstream → added DNA repeat as part of replication cycle
o Other end has looped out → priming in wrong place → deleting the sequence
o Can get insertions or deletions
o Partial duplication of genetic material that codes for protein - Retrotransposition
o Retrotransposons can reverse transcribe RNA copies back into DNA and spread across genomes over evolutionary time
Successful gene duplication
- Successful = gene survives
- Successful outcome #1 → gene originally w/one copy duplicated → hypothesis: 2 copies should double synthesis rate if everything else is equal
o If beneficial → retain that
o If second copy does not provide dosing advantage → can pick up random mutations → will eventually inactive random mutation → over evolutionary time accumulate mutations → get pseudogenes (no longer fully functional gene) - Successful outcome #2 → getting new function
o “neofunctionalization” or sub-function of parental copies - If selection pressure just for dosage → genes stay similar
- If no selection pressure for second copy → one copy either degrades entirely (pseudogene) or gets a new function if it provides advantage
Gene neofunctionalization example
- Trypsin vs chymotrypsin
o Evolved to be different proteases
o Trypsin → cuts at Arg & Lys
o Chymotrypsin → cuts at Phe, Trp & Tyr
o Not structurally identical but similarities in proportion of strand/helices and nature of active site
Pseudogenes
- Copies of functional genes → altered/missing regions
- Often have stop codons/frameshifts/missense mutations → kill reading frame of protein
- May have regulatory role → often producing RNA
- Increase genome size (cost/benefit)
Types of pseudogenes
- “non-processed” pseudogenes:
o Tandem duplication of genomic region
o Inactivating mutations/incomplete duplications
o Part of genome missing regulatory regions → no promoter, enhancers in correct place but does have original intron/exon structure - “processed” pseudogenes:
o Undergoes reverse transcriptase activity (LINE, retrovirus) → mRNA to cDNA → genome integration to make second duplicated gene copy
o Lacks regulatory regions e.g. introns
o Can have different combinations of exons
o Loses most of promoter region except 5’ untranslated region at front of gene
o Could contain poly(A) tail
o Can integrate into same or different chromosome
Examples of pseudogenes
- Ribosomal proteins
o Highly duplicated across different species and highly conserved (essential for protein synthesis machinery)
o Associated w/ L1 retrotransposon
o May have functional role as have high expression rate - Humans have 20,000 pseudogenes → most are ribosomal
o 2/3 of these also in chimpanzee genome
o Less than 12 shared w/mouse genome
o Not clear what these genes are doing
Multigene families
- If duplication is beneficial, multigene family can be formed.
- E.g. rRNA (v. important so highly conserved)
- Tandem gene families = clustered on same chromosome
- Dispersed gene families = on different chromosome
Globin superfamily
Example of duplication & divergence
Carry out different functions in different tissues
Mixture of co-localised gene sin clusters and dispersal of these across the whole genome on different chromosomes → tandem & dispersed
Can trace evolution over different organisms → compare genes within/between species
Globins are v. common → present in all 3 domains of life
Haem-containing protein domain → v. diverse
Used for oxygen transport, storage, sensing & detoxification
Haemoglobin: tetramer (2α, 2ß)
Myoglobin: monomer
Different structures because changes the property of which they can load/take off oxygen
Others include: neuroglobin, androglobin, cytoglobin, globin E, globin X, globin Y
Haemoglobin
- Cooperativity in binding:
o Difficulty when oxygen initially tries to bind haem at low concentration
o Each subsequent oxygen binding cooperatively helps the next one within tetramer → get non-linearity in binding curve → sigmoidal curve as haem requires high levels of oxygen to bind oxygen
Myoglobin
- Found in muscles
- Has simpler binding curve → no cooperativity
- Higher affinity for oxygen
- Having different proteins for oxygen storage and transport w/different binding affinities is useful
Genome duplication
- Larger duplication than genes/segments is possible → can affect genome structure
- Whole chromosome duplication → trisomy 21 → ‘down syndrome’
o Gene product imbalance
o Reduced life expectancy - Genome sequencing suggested major metazoan lineages have undergone whole genome duplications (WGD)
Polyploidy
- Multiple complete sets of chromosomes
- Useful in agriculture to make bigger cells → bigger fruit
- ~80% of flowering plants: oats, cotton, potatoes, bananas, coffee, etc
- Common in invertebrates, fish & amphibians; rare in mammals
Autopolyploid
- Multiplication of identical species within single species
- Meiosis error within single species
- Fertilization of unreduced gametes
- Accidental production of diploid gametes not v. rare (1-40%)
- Can induce disease symptoms:
o ‘Genomic shock’ → widespread activation of transposons, gene expression, recombination (short-term effect)
o These can then stabilise over time → produce fertile gametes and pass down duplications - Need to have even/paired up number of chromosomes to align properly during metaphase
- Autopolyploids can reproduce successfully but cannot breed with parent species → introduces speciation
Allopolyploidy
- Hybridisation between 2 reproductively compatible species
- One-step model:
o Fertilization of unreduced gametes from 2 diploid species - Two-step model:
o Hybridisation between haploid gametes followed by somatic doubling of chromosomes in zygote
o In plants, pollen from 1 species germinates on stigma of 2nd → endoreduplication in zygote - Triploids:
o Tetraploid + diploid parents → triploid paired up zygotes
o Triploid is viable but makes unbalanced gametes (odd #) so cannot segregate in meiosis II
Triploid example: wheat-rye hybrid
- Cross good traits: high yield of wheat + disease tolerance of rye
- Wheat (n=28) + rye (n=14) = Triticale (n= 21) → not fertile
- To overcome this:
o Treatment w/colchicine (chemical) interferes w/spindle machinery of cells → doubles chromosomes in germ cells
o Now have 42 chromosomes → fertile Triticale
The effects of WGD
- Cytogenetics = chromosome counts
o Use dyes; do karyograms - Detect multivalent formation = chromosomes line up and undergo homologous recombination
o Can undergo more diversity and local gene duplication
o Genome size comparison, etc
o Difficult to discern ‘auto’ vs ‘allo’-ploidy - Saccharomyces cerevisiae → brewer’s yeast
o Compare every gene to every other gene
o Duplicated sets of genes → can compare to ancestors, related yeast species, etc
o Long time ago so evidence lost but estimate 10% of genes derive from WGD
Genome duplication in multicellular organisms
- Genome duplication drives metazoan expansion
- Increase in organisational complexity
- Main controller of body counts = Homeobox gene (Hox genes)
o Encode for ‘homeodomain’ → DNA binding proteins (~60 amino acids long) → transcription factors that regulate genes - Studied a lot in fruit flies
o E.g single homeotic mutation doubles number of wings in Drosophila (bithorax)
Hox gene family
- V. well organised
o Spatial and temporal collinearity
o Order of genes on chromosome reflects expression order - Expressed in different regions of developing embryo
- Blueprint same across many different species
o Insect only 1 Hox cluster
o Vertebrates e.g. mouse have many Hox cluster (usually 4)
o Number of segments corresponds to number of clusters and components within them
2R/3R hypothesis of WGD
- “Complexity in fish and vertebrate formation probably driven by WGD”
- Evidence: looking at Hox clusters
- B. lanceolatum → 1 cluster w/15 genes
o Thought to be last common ancestor of all vertebrates - Sea lamprey (fish-like parasite) → 4 clusters before increase in body plan complexity
- Hagfish (has spinal cord) → 4 clusters
- Sharks → even more clusters
WGD benefits
- Raw material for evolutionary diversification
- Potential for neofunctionalization, divergence, pseudogene formation, etc for single genes → large amount of substrate for WGD
- Debate how beneficial it is in short-term → can get genomic shock from too much DNA
- Extra copies of genes provides some protection against environment and extinction
- Defence against mutation because have spare copy of every gene
o Allows to do new things e.g. colonise new environments - Fitness consequences:
o Increased cell size (polyploidy)
o Increased organ size
o Faster growth (more metabolic components)
o Have to evolve dosage regulated gene expression - In allopolyploidy get heterosis (hybdrid vigour) → when unrelated sets of genes coming together give healthier, longer-lived, more robust offspring than highly-inbred species → providing larger combinations of wild-type and non-specialised genes
Eukaryotic gene structure
- Evolution = Increasing complexity → gene number; protein number; functions
- Genes are split
- By Walter Gilbert in 1978
- Invented terms intron/exon
- Predicted existence of:
o Alternative splicing = when RNA inside cell gets matured in different ways and introduce different exons so same gene can make different proteins within same cell
o Exon shuffling = evolutionary process to increase/decrease number of introns/exons and swapping them around
Exon shuffling theory
- Introns/exons often border particular subfunctions within proteins
- Eukaryotic proteins → ‘mosaic of motifs’
o Domains 40-100aa = small motif building blocks for stabilization, binding, catalysis, etc
o Discrete and modular → amenable for evolution - Primordial exons correspond to domains:
o Duplication, permutation, rearrangement when in new genome positions could generate new genes and proteins w/diverse functions - Repetition in original gene has different outcomes:
o Affect (increase?) stability, catalysis and modifies functions
Illegitimate non-homologus recombination
Illegitimate non-homologus recombination
Can get unequal crossing between repeat sequences
For short motifs can get replication slippage
Microhomology can drive this illegitimate N-HR
E.g. αA-crystallin gene (hamster) transfected into mouse
Heat shock protein → topoisomerase I nicks DNA and ligates non-homologouse ends → end result: shuffled domain with duplicated gene
Over evolutionary time, this process could happen w/v. low probability but non-zero chance; if domain provide selective advantage, it would become fixed in population
Domain shuffling:
Structural domains from different genes joined together
Mechanisms include illegitimate N-HR → rare process but higher rate of shuffling in some organisms suggests retrotransposition
LINES
- In exon shuffling:
1. Gene w/domains I, II and III → downstream have a LINE between 2 exons
2. When LINE is transcribed might take bit of adjacent exon w/ it
3. After retro-transposition converted to cDNA jumps into genome disrupting gene
4. Get chimeric transcript → potentially makes new protein different exons (adding one, replacing one, etc)
5. Over time LINE is deleted; might undergo retro-transposition elsewhere in genome
6. In evolution → get new gene - LINES also induce ds breaks → can lead to domain shuffling and DNA repair
Transposons
- Transposon carries exon w/it
- Mutator-like transposable elements (MULEs) → longer and can carry more things; contain flanking exons/introns
o Found in plants (e.g. rice has 3000 MULEs in genome) - As MULEs move around the genome they collect sequence → can form internal hybrid genes
o Exons lack translation initiation/termination signals; not full genes, just coding DNA w/o stop codon
Phases
- If shuffle exons in incompatible reading frames → end up adding extra domain in middle of protein → disrupts correct reading frame of all exons downstream
o Perfect system needs exact multiples of 3 and needs to be in perfect frame to maintain protein shuffling compatibility during evolution
o Need intron phases/classes to be the same - Phase 0 = introns lie between 2 codons; perfect set of codons in exons 1 and 2; 0 phase shift
- Phase 1 = introns located after first nucleotide
- Phase 2 = introns located after 2nd nucleotide
o Have extra 1/2 base in exon so need corresponding number of bases on other exon to restore reading frame - Not all intron/exon classes are compatible w/each other → problem for shuffling
- Not all shuffling event is successful; further evolution required to purify mutated frame shift
Splice frame rule
- ‘Following a successful shuffling event a newly acquired exon will be flanked by 2 introns of the same phase, otherwise it will produce a frameshift in the resulting coding sequence’
- Incompatible splicing → negative selection acts on gene to mutate further
Evidence for exon shuffling
- Multicellular structure:
o Extracellular matrix
o Cell adhesion
o Cellular receptors - Shuffling essential for multicellular metazoans
- 6.4% of human genes show evidence for exon shuffling
- Phase 0 → most common form of exon
- Excess of symmetric exons in simpler organisms
- Over evolution, increase in 1-1 exon shuffling since the first animals
o Could be because extra base Gly necessary for multi-domain linkage
Evidence of 1-1 exon shuffling
- Phase 0 introns (most common)→ more ancient → higher frequency in earlier stages of eukaryotic evolution → explains prevalence in non-metazoan lineage
- 1-1 associated w/emergence of necessary features of multi-cellularity
o After divergence of metazoan, shuffling began
o Acquisition of protein domain must overcome structural limitations
o Small & flexible domains fold independently → must be linked
o Phase 1 introns v. frequently interrupt glycine codons → common in linker regions
Shuffling evidence
- Exon duplication: protein example α2 Type 1 collagen
- Highly repetitive sequence
- Tripeptide → Gly – X (often Pro) –Y (often hydroxyproline)
- Gly-Pro typical in linker regions → destroys α/β structures
- Chicken gene has 52 exons → 42 of them have Gly-X-Y repeats
- Tissue plasminogen activator (TPA)
* Found in vertebrate blood → blood clotting
* 4 exons
* Upstream exon encodes ‘finger module’ → fibronectin on top of cascade interacts with plasminogen activator → plasminogen and EGF (epidermal growth factor)
-Each subdomain coded by individual exons enable 1 protein to interact via partial dimerization between modules → easy way to build complicated protein-protein interaction network
* Blood clotting cascade:
-Lots of clotting factors
- Exons spread through entire family → some for function (e.g. proteases); some for interaction between members (e.g. kringle domain)
- Evolution has duplicated exons, shuffled them together for right compatibility, correct introns between them to get spliced and form different variants → get large/complicated split genes
Protein-protein interaction
- Evolution of new protein has domains of other proteins; interacts with itself and other proteins and becomes hub of PPI networks
- Shuffling promotes self-interaction capacity
o Many human PPI networks self-interact between components in network
o Allow formation of dimeric proteins
o Positive natural selection fixing this
o Some domain types have v. specific type of interactions or v. promiscuous interactions → increase diversity in network → e.g. polyglutamines found in everything (sticky and unstructured)
PPI example: Amyloid precursor protein (APP)
- From Alzheimer’s disease
- Has different isoforms w/different exons from alternative splicing
- APP undergoes protease processing:
- α-secretase cuts immature APP → soluble APP
- Or β/γ-secretase gives mixture of soluble APP and β-amyloid protein (bad because it forms v. stable b-sheet structure; intermediates are toxic to cell → drives neurodegeneration)
- Amyloid plaque formation and different proteases due to different alternative splicing and expression of different variants of ‘Kunitz-type protease inhibitor’ (KPI) domain and different secretase forms
- Processing determined by:
- KPI domain presence → inhibits α-secretase (binds to trypsin domain)
- If inherited → more likely to get one type of familial related Alzheimer’s
- Evidence for KPI domain being gained by shuffling:
- Flanked by introns
- Has homology w/other related proteins in genome also embedded in exons