Lecture 5 Flashcards
Simple genetic disorders: Autosomal dominant
- Only one copy/allele required for the disease
- Most affected people only have 1 disease allele
- Equally common in both sexes
- Offspring of affected people have 50% probability of inheriting the disease
Simple genetic disorders: Autosomal recessive
- Two alleles required for the disease
- Equally common in both sexes
- Offspring of two carriers have 25% of inheriting disease
- Disease alleles are ‘masked’ in heterozygous carriers
- Have this skipping of parents- only the way you get it is if both the parents are carriers.
Simple genetic disorders: X-linked recessive
- Females require 2 disease alleles, males only 1
- More common in males
- Sons of carrier females have 50% chance of disease
- Sons of affected males are unaffected
Hardest thing to distinguish
Mapping Mendelian Traits
Non-recombinants - NR(8/10): Offspring of affected people that inherit Allele 2 tend to get the disease and offspring that don’t inherit allele 2 are unaffected
Recombinants - R (2/10): Offspring with Allele 2 but not the disease, or the disease but not Allele 2
Linkage mapping to find traits
- Mendelian Traits are typically rare
- Usually mapped by following the co-segregation of markers and phenotypes in affected families
- Results often expressed as a LOD score (Logarithm of Odds)
- Markers with the highest LOD scores are closest to the gene.
- LOD of 3 means linkage between a marker and a gene is 103:1 i.e. 1000:1 more likely linked to the gene than non-linkage
- Online Mendelian Inheritance in Man (OMIM) is an online database with descriptions of genes, literature, phenotypes etc related to each disease
Whats LOD score
- A probability of linkage between the marker and the gene relative to the marker not being linked to the causal gene. LOD scores go up in units of 1, 2, 3 and 4 etc.
- 10 raised to the power of that number. A lod score of 3 but be 10^3
Examples
Most of these disease causing alleles are very rare.
A lot of these things vary in their frequencies between populations because of the effects of genetic drift. Even when these things are rare, because they have a big effect on the phenotype we can find the genes responsible
Mapping complex diseases and the two approaches
Common diseases e.g. heart disease, cancers, dementia, susceptibility to malaria etc are typically complex and involve a mixture of genetic and environmental causes
Two popular approaches trying to find genes responsible for these traits:
- Exome capture - just sequence the coding bits of the genome (Lecture 2)
- Genomewide association studies (GWAS- more common)- typically used snip chips
Concept behind GWAS
- A new mutation arises that causes or contributes to a disease
- Initially most of the linked SNPs will be in linkage disequilibrium i.e. statistically associated with it
- But over time, recombination will break up these associations. Only the most closely linked loci will remain in LD
- New chromosome on an ancestral chromosome
-Chromosomes in modern day descendants who inherit will have allele 2 at the marker locus significantly more often than in the general population
GWAS – plotting the results
- A GWAS typically involves typing a million SNPs in cases and controls
- Every SNP is tested for an association with the trait/ phenotype of interest
- They will produce something known as a Manhattan plot- because you’re looking for sky scrapers
- They produce a P value for each SNP - if you take a log of that and reverse the sign you can create a statistic of a -log 10 P, the higher it is the better
- Usually the expected and observed test statistics (results from chi squared tests) are plotted against each other
- If the observed values are higher than expected, there could be a risk of false positives due to population structure
- Anything above the line is strongly suggestive of those SNPs being associated with your trait. The line is roughly 0.00007.
QQ plots – detecting structure
- Each point is a SNP
- X axis is expected -log10 P values and Y axis is observed P values
- If line is above X=Y, P values across the whole genome are more significant than expected under the null hypothesis. This suggests we could get false positives in a GWAS.
- Most likely cause is population structure- allele frequencies differ between different populations and they can cause false positives in a GWAS
False positives in GWAS studies
If there is genetic structure in a population, then false associations between a marker and a phenotype can arise by chance
Studies of genetic structure
- Observation that genetic structure influences GWAS results is important
- It means that SNP chips that are good for finding disease variants in one population might not be so good in another population – motivation for HapMap projects (Lecture 2)
- We need to understand human population genetic structure ……. And this can tell us about our history
Estimating and displaying human structure: two main approaches
Clustering: Idea is to group individuals into K different clusters, where individuals within a cluster are more similar to each other than individuals outside of it. Can use an a priori number for K or K can be estimated from the data. Best known program/method is Jonathan Pritchard’s STRUCTURE. Each individual is given a membership coefficient which tells us how well it fits its cluster and whether it contains genes from >1 cluster.
Try to work out how many distinct genetic structures
Multivariate approaches: Best known approach is Principal Component Analysis (PCA) which uses allele frequencies from many markers
Human Population Structure
- Used microsatellite markers (for runners for SNPS) - didn’t have as much genetic variation.
- 93-95% of genetic variation within populations- not between populations but within populations.
- However, there are some subtle differences still possible to identify different genetic clusters
- A value of 2 splits East Asias, Oceanias and Americas from EuroAsia and Africa
- Plots like these known as ‘Structure plots’ (after the program first written to identify clusters).
- Each colour is a distinct genetic cluster
- The number of colours will represent the value of K- trying to work out how many genetic clusters there are. Beginning to identify slightly different genetic structures.