Gene duplication & Exon shuffling Flashcards
How can a genome acquire a new gene?
- Horizontal gene transfer
- Exon shuffling
- Duplication and divergence
o 1% chance for 1 gene to duplicate in 1 million years
Function of genes
- Promiscuous = Side reaction has no biological function
- Bifunctional = both activities have a biological function
- Over evolution, 2 functions diverge → enzymes pick up different mutations → specialise → become better at catalysing one reaction or what was originally a side reaction
How is DNA duplicated by recombination?
- Unequal crossing over (meiosis)
o Only requires certain lengths of similar sequences
o Can get recombination between sets of repeats that are inappropriately lined up
o One chromosome has duplication; other has deletion → have different daughter gametes → if have selective advantage will survive through evolution - Unequal sister chromatid exchange (mitosis)
o Involves exchange between two chromatids
o Paired up on repeat sequence → one chromatid duplication, one deletion
o Depending on species will not be passed on to progeny - DNA amplification during replication
o In haploid organisms (e.g. bacteria)
o Unequal recombination during replication → ‘replication bubble’: DNA splits up in replication forks → homologous DNA but inappropriate lining up so one strand has duplication of region, other gets deletion - Replication Slippage
o For short DNA sequences e.g. microsatellites, CAG triplet, poly-Q Huntington’s disease
o Not common for genes
o DNA loops out one repeat and starts to re-pair-up downstream → added DNA repeat as part of replication cycle
o Other end has looped out → priming in wrong place → deleting the sequence
o Can get insertions or deletions
o Partial duplication of genetic material that codes for protein - Retrotransposition
o Retrotransposons can reverse transcribe RNA copies back into DNA and spread across genomes over evolutionary time
Successful gene duplication
- Successful = gene survives
- Successful outcome #1 → gene originally w/one copy duplicated → hypothesis: 2 copies should double synthesis rate if everything else is equal
o If beneficial → retain that
o If second copy does not provide dosing advantage → can pick up random mutations → will eventually inactive random mutation → over evolutionary time accumulate mutations → get pseudogenes (no longer fully functional gene) - Successful outcome #2 → getting new function
o “neofunctionalization” or sub-function of parental copies - If selection pressure just for dosage → genes stay similar
- If no selection pressure for second copy → one copy either degrades entirely (pseudogene) or gets a new function if it provides advantage
Gene neofunctionalization example
- Trypsin vs chymotrypsin
o Evolved to be different proteases
o Trypsin → cuts at Arg & Lys
o Chymotrypsin → cuts at Phe, Trp & Tyr
o Not structurally identical but similarities in proportion of strand/helices and nature of active site
Pseudogenes
- Copies of functional genes → altered/missing regions
- Often have stop codons/frameshifts/missense mutations → kill reading frame of protein
- May have regulatory role → often producing RNA
- Increase genome size (cost/benefit)
Types of pseudogenes
- “non-processed” pseudogenes:
o Tandem duplication of genomic region
o Inactivating mutations/incomplete duplications
o Part of genome missing regulatory regions → no promoter, enhancers in correct place but does have original intron/exon structure - “processed” pseudogenes:
o Undergoes reverse transcriptase activity (LINE, retrovirus) → mRNA to cDNA → genome integration to make second duplicated gene copy
o Lacks regulatory regions e.g. introns
o Can have different combinations of exons
o Loses most of promoter region except 5’ untranslated region at front of gene
o Could contain poly(A) tail
o Can integrate into same or different chromosome
Examples of pseudogenes
- Ribosomal proteins
o Highly duplicated across different species and highly conserved (essential for protein synthesis machinery)
o Associated w/ L1 retrotransposon
o May have functional role as have high expression rate - Humans have 20,000 pseudogenes → most are ribosomal
o 2/3 of these also in chimpanzee genome
o Less than 12 shared w/mouse genome
o Not clear what these genes are doing
Multigene families
- If duplication is beneficial, multigene family can be formed.
- E.g. rRNA (v. important so highly conserved)
- Tandem gene families = clustered on same chromosome
- Dispersed gene families = on different chromosome
Globin superfamily
Example of duplication & divergence
Carry out different functions in different tissues
Mixture of co-localised gene sin clusters and dispersal of these across the whole genome on different chromosomes → tandem & dispersed
Can trace evolution over different organisms → compare genes within/between species
Globins are v. common → present in all 3 domains of life
Haem-containing protein domain → v. diverse
Used for oxygen transport, storage, sensing & detoxification
Haemoglobin: tetramer (2α, 2ß)
Myoglobin: monomer
Different structures because changes the property of which they can load/take off oxygen
Others include: neuroglobin, androglobin, cytoglobin, globin E, globin X, globin Y
Haemoglobin
- Cooperativity in binding:
o Difficulty when oxygen initially tries to bind haem at low concentration
o Each subsequent oxygen binding cooperatively helps the next one within tetramer → get non-linearity in binding curve → sigmoidal curve as haem requires high levels of oxygen to bind oxygen
Myoglobin
- Found in muscles
- Has simpler binding curve → no cooperativity
- Higher affinity for oxygen
- Having different proteins for oxygen storage and transport w/different binding affinities is useful
Genome duplication
- Larger duplication than genes/segments is possible → can affect genome structure
- Whole chromosome duplication → trisomy 21 → ‘down syndrome’
o Gene product imbalance
o Reduced life expectancy - Genome sequencing suggested major metazoan lineages have undergone whole genome duplications (WGD)
Polyploidy
- Multiple complete sets of chromosomes
- Useful in agriculture to make bigger cells → bigger fruit
- ~80% of flowering plants: oats, cotton, potatoes, bananas, coffee, etc
- Common in invertebrates, fish & amphibians; rare in mammals
Autopolyploid
- Multiplication of identical species within single species
- Meiosis error within single species
- Fertilization of unreduced gametes
- Accidental production of diploid gametes not v. rare (1-40%)
- Can induce disease symptoms:
o ‘Genomic shock’ → widespread activation of transposons, gene expression, recombination (short-term effect)
o These can then stabilise over time → produce fertile gametes and pass down duplications - Need to have even/paired up number of chromosomes to align properly during metaphase
- Autopolyploids can reproduce successfully but cannot breed with parent species → introduces speciation
Allopolyploidy
- Hybridisation between 2 reproductively compatible species
- One-step model:
o Fertilization of unreduced gametes from 2 diploid species - Two-step model:
o Hybridisation between haploid gametes followed by somatic doubling of chromosomes in zygote
o In plants, pollen from 1 species germinates on stigma of 2nd → endoreduplication in zygote - Triploids:
o Tetraploid + diploid parents → triploid paired up zygotes
o Triploid is viable but makes unbalanced gametes (odd #) so cannot segregate in meiosis II
Triploid example: wheat-rye hybrid
- Cross good traits: high yield of wheat + disease tolerance of rye
- Wheat (n=28) + rye (n=14) = Triticale (n= 21) → not fertile
- To overcome this:
o Treatment w/colchicine (chemical) interferes w/spindle machinery of cells → doubles chromosomes in germ cells
o Now have 42 chromosomes → fertile Triticale
The effects of WGD
- Cytogenetics = chromosome counts
o Use dyes; do karyograms - Detect multivalent formation = chromosomes line up and undergo homologous recombination
o Can undergo more diversity and local gene duplication
o Genome size comparison, etc
o Difficult to discern ‘auto’ vs ‘allo’-ploidy - Saccharomyces cerevisiae → brewer’s yeast
o Compare every gene to every other gene
o Duplicated sets of genes → can compare to ancestors, related yeast species, etc
o Long time ago so evidence lost but estimate 10% of genes derive from WGD
Genome duplication in multicellular organisms
- Genome duplication drives metazoan expansion
- Increase in organisational complexity
- Main controller of body counts = Homeobox gene (Hox genes)
o Encode for ‘homeodomain’ → DNA binding proteins (~60 amino acids long) → transcription factors that regulate genes - Studied a lot in fruit flies
o E.g single homeotic mutation doubles number of wings in Drosophila (bithorax)
Hox gene family
- V. well organised
o Spatial and temporal collinearity
o Order of genes on chromosome reflects expression order - Expressed in different regions of developing embryo
- Blueprint same across many different species
o Insect only 1 Hox cluster
o Vertebrates e.g. mouse have many Hox cluster (usually 4)
o Number of segments corresponds to number of clusters and components within them
2R/3R hypothesis of WGD
- “Complexity in fish and vertebrate formation probably driven by WGD”
- Evidence: looking at Hox clusters
- B. lanceolatum → 1 cluster w/15 genes
o Thought to be last common ancestor of all vertebrates - Sea lamprey (fish-like parasite) → 4 clusters before increase in body plan complexity
- Hagfish (has spinal cord) → 4 clusters
- Sharks → even more clusters
WGD benefits
- Raw material for evolutionary diversification
- Potential for neofunctionalization, divergence, pseudogene formation, etc for single genes → large amount of substrate for WGD
- Debate how beneficial it is in short-term → can get genomic shock from too much DNA
- Extra copies of genes provides some protection against environment and extinction
- Defence against mutation because have spare copy of every gene
o Allows to do new things e.g. colonise new environments - Fitness consequences:
o Increased cell size (polyploidy)
o Increased organ size
o Faster growth (more metabolic components)
o Have to evolve dosage regulated gene expression - In allopolyploidy get heterosis (hybdrid vigour) → when unrelated sets of genes coming together give healthier, longer-lived, more robust offspring than highly-inbred species → providing larger combinations of wild-type and non-specialised genes
Eukaryotic gene structure
- Evolution = Increasing complexity → gene number; protein number; functions
- Genes are split
- By Walter Gilbert in 1978
- Invented terms intron/exon
- Predicted existence of:
o Alternative splicing = when RNA inside cell gets matured in different ways and introduce different exons so same gene can make different proteins within same cell
o Exon shuffling = evolutionary process to increase/decrease number of introns/exons and swapping them around
Exon shuffling theory
- Introns/exons often border particular subfunctions within proteins
- Eukaryotic proteins → ‘mosaic of motifs’
o Domains 40-100aa = small motif building blocks for stabilization, binding, catalysis, etc
o Discrete and modular → amenable for evolution - Primordial exons correspond to domains:
o Duplication, permutation, rearrangement when in new genome positions could generate new genes and proteins w/diverse functions - Repetition in original gene has different outcomes:
o Affect (increase?) stability, catalysis and modifies functions