Recombination Flashcards
What is a haplotype?
A set of DNA variations that tend to be inherited together
- Chromosome has enormous anumber of variable sites
- So different chromosomes will have different combinations of alleles on those sites (loci)
How do haplotypes work in diploids?
- Typing technique provides the genotype for each locus however one does not know the combination of the alleles in each of the chromosomes
- E.g., individual can be heterozygous for locus A (A1/A2) and heterozygous for locus B (B1/B2)
What are ‘phases’ of haplotypes?
The haplotypic combinations present
What methods can we use to find out the haplotype phases?
It is challenging:
1. Allele specific PCR and Next Gen Sequencing
- Can get sequences for whole chromosome arms - ‘long read’ NGS
- Alleles yield PCR products of different sizes
- Correct combination of allele specifieic primers to amplify haplotypes
- Algorithmic analysis of sequences
-
Somatic cell hybrid
- More experimental approach
- Fusion of mouse of human cells
- As you propagate cell lines, because cells have chromosomes from mized species - so chromosomes are ejected from cell
- Progressive loss of chromosomes during cell division
- End up with select cells which only contain single human cells
What is recombination?
The shuffling of chromosomes segments to generate a new haplotype combination
What is the generation of new haplotypes caused by?
- Most often - Recombination
- Mutation - less common
When / how does recombination occur?
During meiosis - when maternal and paternal chromosomes are aligned and cross over
- Humans - recombination rate ~ 1 to 10 events per chromosome
- Occurs throughout genome - but there are ‘recombination hotspots’ as well as ‘cold regions’
- Can cause gene conversion (non-reciprocal) recombination or conventional (reciprocal) recombination
- The further apart 2 sequences - the higher the probability of recombination between them
Why is recombination important?
Modulates / influences each of the 4 evolutionary forces dicussed in these lectures
- Fundamental part of sexual reproduction
- Creates novel combinations of genes
- Purges genome of deleterious mutations (removes them)
- Increases efficiency of natural selection - reduces interference between loci under different selection regimes
- Responsible for different sequences having different ancestral histories - increases the information available from the past but also increases the complexity of its analyses
- Can be exploited to infer population history - (e.g., new selection tests, admixture)
- Can be exploited to help locate genes of interest - (e.g., disease loci in humans)
Why is there variation in recombination in different parts of the chromosomes?
- Due to recombinogenic motifs that are found across the human genome
- These are recombination hotspots
- Motifs often found in transposable elemenst sequences
- Rate of recombination drops off rapidly away from motif
How do population processes and recombination interact with the strength of linkage on chromosomes?
- One hand - population processes e.g., demographic changes and positive selection - increase linkage / linkage disequilibrium
- Other hand - recombination - reduces linkage - so high recombination rate - decreases linkage
- Genetic drift - increases linkage between sites on a chromosome
What is linkage disequilibrium?
LD is the non-random association of variants of different polymorphic sites between alleles in the population - alleles at different loci
- E.g., When loci have LD = we can predict the variant at one site if we know the variant at another site
- So - sites close together are less likely to have recombination between them - so are more likely to be in LD
- E.g., if no linkage exists (linkage equilibrium) - frequency of each haplotype corresponds to product of allele frequencies
- Recombination decreases LD in each generation
What is linkage disequilibrium important in studying?
- Important in studing selection, demographic scenarios (like admixture) and in association studies
- LD can tell us that a demographic event may have occured
What processes increase LD?
- Positive selection (selective sweeps)
- Drift in small populations
- Population growth
- Population structure and admixture
How can you measure LD - quantify strength of LD?
- Lewontins D and D’ - based on frequencies of alleles and haplotypes in populations
- r^2 - based on correlations among sites versus chromosomal distance
How do linkage equilibrium/disequilibrium differ?
- LE: alleles at different loci are associated in proportion to their allele frequencies
- LD: alleles at an individual loci show association or dissociation relative to their allele frequencies
For Lewontins D - what do the D values indicate about LD?
D relates to the expected haplotype frequencies as a measure of the deviation from the pattern under random assortment and free recombination
- D = 0 - linkage equilibrium
- D > 0 - Association between alleles - occurs together more than expected - coupling phase
- D < 0 - Dissociation between alleles - occurs together less than expected - repulsion phase
- D usually varies between -0.25 - 0.25
For linked markers, how does LD change with chromosomal distance?
- LD decreases as a function of chromosomal distance due to recombination
- Size of chromosomal blocks in LD extends over can be informative - form part of hypothesis testing around demographic/selection processes
What is admixture?
Two genetically distinct (isolated) populations coming together
How does LD work in admixture?
- In admix - variable genes from two populations start off in LD
- Since LD in random mating pops decreases over time - recently admixed pops have long-range LD whereas ancient admixes have short-range of little LD
- Can use information to date admix time - association studies - determine when two populations mixed together
What does the amount of LD in a population depend on?
Amount of LD in populations depends on recombination rate + when an admixture event may have occured in past
- Recombination high = LD reduces rapidly
- Recent admixture = high LD, ancient admixture = lower LD
Give an example of an admixture event in humans
- Lemba - Bantu-speaking Africans who claim Jewish ancestry - confirmed by Y-chromosome studies
- Can compare observed Lemba LD with predictions from computer simulations of mixing Bantu with Jewish (Ashkenazi) populations
- LD persists much longer in Lemba - what you would see in an admixed population based on predictions
How can you think about the coalescent with recombination?
- Recombination in coalesence tree (a) means that different parts of sequence have different trees (b)
- If we combine them (c) - we no longer have a strictly branching tree - but a network
- This is the ‘ancestral recombination graph’ : the basis for modelling the coalescent with recombination
How do selection forces affect diversity at linked loci and give an example?
Forces at one position affect the surrounding areas of the genome:
- Purifying selection: reduction in diversity at linked loci (background selection)
- Positive selection: hitchhiking effect leads to selective sweep
- e.g., selection on tb1 gene during domestication of maize - loss of nucleotide diversitiy (PI) in 5’ upstream area regulatory region of maize (caused by selective breeding) - localized loss of diversity is important indicator of where positive selection may have occurred
Explain the process of hitchhiking and selective sweeps?
- Target allele is pushed up to high frequency due to positive selection (selective sweeps)
- But other alleles at different loci that are linked on same genetic background are also pushed up to high frequency - hitchhiking
- Recombination is actively trying to break up LD caused by e.g., drift
How is the amount of LD determined?
Depends on recombination rate and intensity of selection pressure:
- Selection pressure mild + high recombination rate = Low LD and selective sweep will be minor and constrained to small area
- If selection pressure strong + low recombination rate = LD is high and selective sweep intense over much larger area of chromosome - so even very distant alleles are swept up to high frequency
How would you use hitchhiking and selective sweeps for testing?
- Effect of selective sweep higher in sequences adjacent to selected sequence and it decreases with distance and so LD decreases
- Can test for selection
- But need a good estimate of recombination rate parameter
Give an example of a test that can be used to detect and quantify LD
Long-Range Haplotype (LRH) test:
- Measures relationship between allele frequency and extent of LD
- Long-range LD develops when rise of frequency of advantageous alleles is faster than decay of LD in the haplotype
- Extended Haplotype Homozygosity (EHH) - probability that two randomly chosen sequences carrying the core haplotype/SNPs are identical by descent (similar to homozygosity)
- Decreases to 0 at increasing distances
- +ve selection is detected considering high EHH
- Generate EHH scores for alleles at diff loci - along genome: lower score = more recombinants - look for sites where EHH is higher than normal
Give an example of this LRH/EHH process?
Lactase gene haplotype on chromosome 2 - is longer in Europeans - suggesting strong selection
- Identified more than 250 candidates for genes selected for - relating to pathogen resistance, metabolism (diet) and brain development
- Indicates recent selective sweeps in European populations - due to development of farming/agriculture - less strong selection in African populations
Give an example of how you can do genome-wide detection of positive selection
Using XP-EHH - detects selective sweeps where the selected allele rises to high frequency in one population but remains polymorphic in the human pop
- e.g., SLC24A5 - in natural skin color variation - Europeans vs West Africans
- Derived Ala111Thr allele at SLC24A5 gene influences light skin tone of Europeans - polymorphism may account for 25-40% of difference between Europeans and West Africans
What patterns would be expected for deleterious background selection?
- Deleterious alleles (that reduce fitness) are constantly being generated in population
- Purifying selection will remove deleterious alleles
- Linked neutral variation will be removed along with deleterious alleles
- However, this results in the loss of only the specific haplotype(s) on which the deleterious mutation arose - leaving other variation unaffected
- Loss of neutral variation: Expect a slight reduction in amount of diversity relative to the expectation for a case with no selection at all - but less strong than a strong selective sweep
- Is dominant form of selection in human genome
What might this deleterious background selection look like on a gene genealogy?
Is visualised by losing just a few terminal twigs from the gene genealogy
How does this deleterious background selection compare with the positive selection scenario and what happens to the gene genealogy?
- +ve selection scenario drives a haplotye to high frequency - substantially reducing linked variation - lots of branches and twigs lost from gene genealogy
- Deleterious background selection - only specific haplotypes are lost - where deleterious mutation arose - so only a few terminal twigs are lost from gene genealogy
Directly compare features of selective sweep vs background selection
(type of selection, effect on Ne, excess of alleles and genealogy tree effect)
Selective sweep:
- Caused by positive selection (rare)
- Reduction in neutral diversity (Ne)
- Excess of low frequency alleles - singletons
- Short internal branches in coalescent tree
Background Selection:
- Caused by purifying selection (common and continual)
- Reduction in neutral diversity (Ne) - removes individual haplotypes from tree
- Frequency spectrum similar to Neutral expectation - because only removing individual haplotypes
- Phylogenetic tree will appear the same as neutral coalescent tree
How does linkage disequilibrium differ depending on population size?
- Small population: high drift = lineages share common acestor in recent past - so sites will be tightly linked (lots of linkage) = high LD
- Large population: lower drift = common ancestors of lineages in distant past = lots of time for recombination to reduce LD even between adjacent sites