week 10 - genetic analysis Flashcards
Genetic Variation Across Species
chatgpt
Genetic variation exists in all species and includes SNPs, indels, CNVs, structural variants, and novel mutations. These variations can be common or rare, and their frequency and effects can vary by species and population.
Plants: often show high structural variation (e.g. polyploidy, TE activity)
Animals: domesticated species have artificial selection-driven variation
Humans: millions of SNPs, rare variants, and de novo mutations per individual
Understanding Complex Traits
chatgpt
Complex traits:
Controlled by many genes (polygenic)
Influenced by environment and gene-environment interactions
Show continuous variation (e.g. height, yield)
Often have non-Mendelian inheritance and no clear genotype-phenotype map
QTL Mapping Approaches
linkage analysis
GWAS
pedigree analysis
QTL Mapping Approaches
linkage analysis
Population Type: Controlled crosses or pedigrees
Resolution: Low (Mb–cM)
Pros: Powerful for rare variants, family-based
Cons: Low resolution, limited by recombination
QTL Mapping Approaches
GWAS
Population Type: Natural populations
Resolution: High (kb–100 kb)
Pros: High resolution, no prior knowledge needed
Cons: Sensitive to confounding, missing heritability
QTL Mapping Approaches
Pedigree Analysis
Population Type: Human/animal pedigrees
Resolution: Medium
Pros: Tracks inheritance of traits
Cons: Relies on known family structure
Design and Challenges of GWAS
chatgpt
GWAS design principles include using large, diverse, well-matched populations; high-density SNP genotyping; and statistical thresholds to detect associations.
Challenges:
Missing heritability
Linkage disequilibrium confounds
Rare variant detection
Causal variant identification
Population stratification
From QTL to Causal Variant
chatgpt
Moving from QTL to causal variants involves:
Fine-mapping within associated regions (e.g., higher resolution studies)
Functional validation via expression data, reporter assays, or gene editing
Integration with other data (e.g., eQTLs, chromatin state, transcriptomics) It’s difficult because QTL peaks often span many genes, and some signals lie in non-coding or regulatory regions far from known genes.
Genetic variation and polymorphism
- Variation: the existence if two or more forms (alleles) of a section of DNA
- If a variation occur with frequency >0.5% then it is a polymorphism
- Genetic variation could lead to observable effects, but the majority do not
Genetic variation and polymorphism
example
- Example of an Arabidopsis dwarf mutant that has a SNP which converts a normal looking plant – like this one – to a dwarf plant.
o This is a simple case where there is a 1:1 mapping between genotype and phenotype.
o The dwarf phenotype is caused by mutation in a single gene
Classes of genetic variants:
Single Nucleotide Variant (SNV/SNP)
Change of one base (A→G, T→C); most common variant; may be silent or impactful
Classes of genetic variants:
- Insertion-deletion variant
o INDEls occur where one or more bases are present in some genome and absent in others.
o Generally only a few bases long but can be up to 80kb in length!
Addition or loss of one or more bases; can shift reading frames in coding regions
Classes of genetic variants:
- Block substitution
o a string of adjacent nucleotides varies between individuals
Classes of genetic variants:
- Inversion variant
o the order of the bases is reversed in a defined section of the genome.
A segment of DNA is reversed within the chromosome
Classes of genetic variants:
- Copy number variant
Large DNA segments are duplicated or deleted; affect gene dosage
o identical or nearly identical sequences are repeated in some genomes and not others.
Genetic variation and polymorphism
frequency?
- Human genetic variations are referred to as either COMMON or RARE, to denote the frequency of the minor allele – the less frequent allele in the population
o Rare variants are population-specific
Genetic variation and polymorphism
common variant
- Common variants have minor allele frequency (MAF)
o >1%
o E.g. a C/T SNP with 5% frequency of the T allele
Genetic variation and polymorphism
rare variant
- Rare variants have minor allele frequency
o <0.5%
Genetic variation and polymorphism
- Novel/de novo variant
- Novel/de novo variants occur only in a single family/individual
o E.g. a variant that we do not share with our parent
Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP)
- Single base pair substitutions
- Arise through mistakes in DNA replication or caused by mutagens
o E.g. mutation rate in Arabidopsis is 7x10^-9 base substitutions per site per generation - Biallelic – 2 alleles (in diploids)
Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
frequency
- Minor allele frequency can range from <1% to 50%
Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
methods for detecting
- Many methods for detecting SNPs
o SNP microarrays
Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
common?
- SNPs are the most common
o Which is why they are used a lot
Deletions, duplications and insertions
- Expand or contract the length of non-repetitive DNA
- Small deletions and duplications arise by unequal crossing over
- Small insertions can arise through the activity of transposable elements
Deletions, duplications and insertions
types
deletion
novel sequence insertion
mobile element insertion
tandem duplication
interspersed duplication
inversion
translocation
Human genetic variation
- 4-5 million differences between any 2 humans
o 1 in 1000 bases - Most differences occur at common locations
o 4-5 million common SNPs (>0.5%)
o 50K rare mutations (<0.5%)
o 40-80 de novo mutations
Human genetic variation
QTL mapping
So what we will attempt to do with QTL mapping is to relate these differences to differences in the phenotype of a trait of interest.
PHYSICAL VS GENETIC MAP
physical map
- Physical distance in nucleotide bases (kb)
- The actual distance in bp between two variants
PHYSICAL VS GENETIC MAP
genetic map
- RF between two markers
- Based on the number of recombination events occurring in a region
- RF is relation to genetic distance in cM via a mapping function
- Physical distance is usually correlated with genetic distance
- Markers are closer together in regions with low RF
- Markers are further apart in regions of high RF
PHYSICAL VS GENETIC MAP
genetic map: accurate?
- Not always accurate
o Hotspots
o Regions where you would expect to find more recombination that others
PHYSICAL VS GENETIC MAP
why can Two SNPs be physically close together but genetically far apart
- Reason:
o Because there might be no recombination between them
o So genetically look far away from each other (even though they are physically close)
PHYSICAL VS GENETIC MAP
Principles of genetic mapping
- Find regions of the genome that are variable (markers)
- Map and order these regions to produce a genetic marker map
- Map traits of interest to these markers
SUMMARY
- There are many types of DNA sequence variation in populations of plant, animal or microbial species
- By assembling a map of variation we can link this map to variation in phenotypic traits of interest
QUANTITATIVE GENETICS
how is variation divided?
- Variation in humans, plants and animals is broadly divided into 2 types:
qualitative
quantitate
QUANTITATIVE GENETICS
Qualitative
Blood groups, eye colour, flower colour
Only a few genotypes
QUANTITATIVE GENETICS
Quantitative
Height, weight
Many genotypes
Quantitative or biometrical data
- Deals with the study of inheritance of the quantitatively varying characters (complex or quantitative traits) that are controlled by many genes and also to a considerable extent by the environment
example so quantitative traits
Plants
morphology
- Yield
- Quality
- Maturity
- Size (height, girth, biomass)
example so quantitative traits
Plants
physiology
- Abiotic stress responses (e.g. drought tolerance)
- Biotic stress responses (e.g. disease resistance, photosynthetic capability)
example so quantitative traits
animals
morphology
- Size (e.g. weight and height)
- Productivity (e.g. milk and egg production)
- Quality (e.g. meat/wool)
- Fecundity
example so quantitative traits
animals
physiology
- Growth rate
- Abiotic stress responses (e.g. heat tolerance)
- Biotic stress responses (e.g. disease resistance)
- Strength
example so quantitative traits
animals
behaviour
- Intelligence
- personality
example so quantitative traits
humans
morphology
- size (weight, height, ect)
- colour
example so quantitative traits
humans
physiology
- metabolic rates
- diabetes
- hypertension
example so quantitative traits
humans
behaviour
- intelligence
- personality
Threshold traits
- trait which has complex/polygenic inheritance, but only two obvious phenotypes
o e.g. affected or not affected by disease - E.g. type II diabetes
o Individuals who exceed a certain number of risk factors (genetic and/or environmental) will develop the disease and others will not.
Central dogma of molecular biology
- Flow of information
DNA –> RNA –> protein –> produces trait - Goal in genetic is to relate genetic variation to phenotypic variation in a trait
Genetic basis of a quantitative trait
debate
- Debate between mendelian and biometrician
Willian Ratson - Quantitative traits do not follow discreet patterns and cannot be inherited
Francis Galton - Quantitative traits can be inhertited and the degree of inheritance can be estimated by pure stats
Genetic basis of a quantitative trait
debate: ronald fisher
Ronald Fisher came up with the polygenic model
- To reconcile these two opposing views on quantitative traits
Genetic basis of a quantitative trait
the polygenic model
- Fisher first highlighted the polygenic nature of quantitative trait
o The random sampling of alleles at each gene produces a continuous normally distributed phenotype in the population
Genetic basis of a quantitative trait
the polygenic model:
- Quantitative traits are mostly controlled by …
several genes
o Polygenes or QLT (quantitative trait loci)
Genetic basis of a quantitative trait
the polygenic model:
each genes behaves like…
a mendelian gene
o Each gene can segregate independently
Genetic basis of a quantitative trait
the polygenic model:
- There are effects arising from…
from environmental variance
Genetic basis of a quantitative trait
the polygenic model:
interactions
- There is also interaction within each gene (dominance and co-dominance)
- And interaction between genes (linkage and epistasis)
- And interaction between the gene and the environment
Modes of gene action (Interaction between alleles at a locus)
additive effects
o Measure the quasi-independent effects of alleles on a trait
Modes of gene action (Interaction between alleles at a locus)
dominance effects
o Measure the interactions between alleles at a single locus
o E.g. complete dominance (one allele can mask the effect of the other)
Modes of gene action (Interaction between alleles at a locus):
Alleles can interact with each other in a number of different ways to produce…
produce variability of the phenotype.
Modes of gene action (Interaction between alleles at a locus):
additive gene action
When the heterozygote phenotypic value is half way between that of the two homozygotes, gene action is defined as additive.
o Can see in the graph that each A2 allele contributes an increase of i to the phenotype value, in this case +1.
Modes of gene action (Interaction between alleles at a locus):
complete dominance
2) Complete dominance – the phenotype is the same whether you have 1 or 2 A2 alleles.
o Can see in the graph how there is an underlying additive genetic component as shown by the slope of the line – but the values deviate from additivity due to dominance effects. In this case we can see that A1A2 and A2A2 phenotypes are quite similar.
Modes of gene action (Interaction between alleles at a locus):
within dominance:
- Complete dominance
o e.g. Mendels crosses between pea plants with purple flowers or white flowers – all progeny of F1 are purple as Purple P allele is dominant.
Modes of gene action (Interaction between alleles at a locus):
within dominance: incomplete dominance
o heterozygote value is over half way but not quite as high as the A2 homozygote.
o E.g. snapdragons – cross red to white and see pink as neither allele is dominant – a blending of the phenotypes
Modes of gene action (Interaction between alleles at a locus):
within dominance: overdominance
o rare- phenotype of the heterozygote is beyond the range of either homozygote.
Modes of gene action (Interaction between alleles at a locus):
within dominance heterozygote advantage
- Or the “heterozygote advantage” where the heterozygote has better fitness than either of the homozygotes.
o Sickle cell anaemia – where the heterozygote has partial resistance to malaria.
Will genetic architecture be the same in all populations?
- Genetic architecture will differ in different populations.
- Different populations will have different environmental exposures and different alleles segregating.
Challenges for studying quantitative traits
- Genotype not identifiable from phenotype
- Epistasis (gene-gene interaction)
- Genotype x environment interaction (G x E)
Contribution of QLT alleles to a complex trait
- If we could see which individuals had which genes at QTL locus B
- The distributions for the 3 genotypes overlap so we cannot determine genotype just by looking at phenotype.
- An individual with avergae height could be either of the 3 genotypes.
- Let’s say height = 30cm – could be any of the three genotypes!
Genotype x environment interaction
- Different genotypes responding in different ways to changes in the environment
e.g.
a. Trait is not sensitive to environment
b. Trait value higher in environment 2
c. Some genotypes have higher trait value in environment 2, others have lower trait value in environment 2.
Basic elements of mapping QLT
- Phenotype of the trait
- Marker genotype
- Genetic structure of mapping populations
Basic elements of mapping QLT
statistical methods
- Use statistical methods to bring together the 3 sources of data and identify regions of the chromosome, which we call QTL, that are associated with variation in the trait.
SUMMARY
Complex traits are influenced by
several genes and by the environment
o Rare combination of polygenes effect can lead to unexpected phenotypes
SUMMARY
- Since genotype cannot be determined by phenotype, we cannot use..
Mendelian phenotype ratios to work out what is going on and will need new methods of modelling these traits
GWAS design and principles
Ways to genetically dissect complex traits
1) linkage analysis
2) genome wide association analysis
GWAS design and principles
Ways to genetically dissect complex traits:
linkage analysis
o Uses defined population either created using crosses or with familial relationships known (pedigrees)
GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
o Uses naturally occurring populations of individuals
o Used in human, plant and animal populations
GWAS design and principles
Ways to genetically dissect complex traits:
linkage analysis
how
use a segregating population
large recombinant blocks (recent recombination)
F2 mosaic
- due to recombination
makes use of recombination to define where gene of interest might be
GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
how
take natural population as they are and use for analysis
make use of historic recombination over thousands of years
- lots of small recombination blocks
- narrows down region of interest to a smaller interval
can find commonalities between genotype and phenotype
- looking for shared phenotypes between individuals
GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
associations
- Association – e.g. human populations are the results of many generation of recombination in meiosis, producing genomes with short blocks from ancestral individuals.
GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
statistical methods
- Collected genotype and phenotype data as shown in the table.
- Use statistical methods to generate graphs like these in which we have the locations of the markers on the genetic map (x-axis) and on the y-axis, the evidence for a QTL.
- As with all statistical tests we define a threshold for deciding whether or not a result is significant (link to minitab workshops).
- Can see a broad peak. This is a QTL.
- The results of association analysis are presented in a slightly different way. Individual dots = markers.
- See narrow band of markers with significant evidence for QTL.
- Note the band is narrower than with linkage analysis.
GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
next step
- So then the next step is to look at which genes are in the region of the significant markers and perform experiments to confirm that these genetic markers really affect the trait.
Advantages of GWAS approach
- Previous knowledge of the genetic of the trait not required
- Can fine map QTL to 10-100kb because many recombination events have occurred in the history of the population
- Can reveal causal genes in an unbiased way
GWAS design in humans
- A population that segregates for a trait of interest
- Mostly case-control groups
o Case: individuals with the traits (e.g. patients with disease, personality etc.)
Have the things you are trying to find the genetic basis of
o Controls: individuals without the traits (healthy individual)
o Want case and control groups to be similar to each other (avoids confounding factors)
Want a mix of population in both groups - Genotype data
o Mostly SNP arrays that genotype a pre-selected subset of SNPs
o Whole genome or exome sequencing - Statistical analysis to pick up association signal
GWAS design in humans
Population of interest: case-control group
- Compares prevalence of polymorphism between subjects who have that condition (cases) with patients who do not have the condition (control)
- In theory, the case-control study can be described simply.
o First, identify the cases (a group known to have the outcome) and the controls (a group known to be free of the outcome).
o Then, look back in time to learn which subjects in each group had the exposure(s), comparing the frequency of the exposure in the case group to the control group.
Statistics of GWAS studies
- Null hypothesis: there is no association between the marker (e.g. SNP) genotype and the trait
- Alternative hypothesis: there is association between the marker (e.g. SNP) genotype and the trait
- For categorical traits
o Can use statistics similar to chi-square to compare the frequency of genetic variants in the two groups (case/control)
Statistics of GWAS studies
- For continuous/quantitative traits
o Can use parametric analyses like ANOVA and regression to compare normally distributed traits between the genotype group
Statistics of GWAS studies
GWAS tests for…
association of thousands to millions of markers (polymorphism,, e.g. SNP, individual points) with trait status in hundreds to ten of thousands of individuals
Statistics of GWAS studies
manhattan plot
- Looking for the peaks that stand out
o Tells us that there is a specific association - Reach dot represents a variant (a SNP)
o Have many SNP spread across the chromosome - Significant markers (red points) with statistical support (log(p)) above the statistical threshold (red line) are causal of the trait variation or are in LD with the unknown causes variations
Linkage disequilibrium and GWAS
- Linkage disequilibrium (LD) refer to correlation between SNPs
- If equilibrium would be a random association
look at notes for table
GWAS performed to date represent the tip of the iceberg
- Many associations are still hidden and can be uncovered by:
o Studying a larger sample population (> 1 million)
o Studying more diverse populations (non-European populations)
o Incorporating gene x gene and gene x environment interaction
o Considering non-additive models of gene actions
So what is the benefit to all this?
Impact of GWAS findings in medicine
- GWAS have revealed “molecular sub-phenotypes” or ‘heterogenetiy’ of disease
- Increased understanding of disease pathways will provide new drug targets and promote personalised medicine
o i.e. target treatment to underlying subtype - e.g. Can be many genes responsible for e.g. cancer
o These can differ between individuals
o So understanding which genetic pathway in a particular individual will allow for targeted drug treatment
GWAS
Difficulty in finding causal variant
- The associated SNP may be within or close to a gene that Is relevant to the trait of interest
- Goal is to identify this gene and its variants (alleles) that conder different disease risks:
o Quantitative trait nucleotide - Some associations are nowhere near a functional gene
o E.g. a variant on chr9p21 associated with heart attack is 150kb from the nearest gene
GWAS
Much trait variation remains unexplained
- The bulk of the genetic variance underlying the trait heritability has still not been explained
- E.g. > 30 markers associated with Crohn’s disease explains < 10% of genetic variance
- The so-called “missing heritability” problem
GWAS
Pitfall and criticism of GWAS
- Not very useful for very rare disease
o Small amount of population
o Statistics do not really work on small populations - Disease prediction
- True signals
- Population stratification
- Ultra-rare mutations
- Epistasis
- Cause variants or gene
- Missing heritability
GWAS
good for
- Identification of novel SNV-trait associations
- Discovery of novel biological mechanisms
- Diverse clinical applications
- Insight into ethnic variation of complex traits
- Relevant to low frequency rare variants
- Identification of novel monogenic and oligogenic disease genes
- Relevant to the study of structural variation
- Multiple applications beyond gene identification
- Straightforward GWAS generation management and analysis
- Easy to share and publicly available data
GWAS SUMMARY
- Genetic architecture of complex traits is…
Complex
o Highly polygenic
GWAS SUMMARY
- GWAS is a common way of…
mapping complex traits in natural populations
GWAS SUMMARY
- GWAS can be…
direct or indirect
o Typical genetic effect sizes are small
o GWAS has helped in understanding disease pathways but only account for small proportion of variation
GWAS SUMMARY
- The major challenge will be to…
discover the mechanisms for how specific genetic variants contribute to disease risk
GWAS SUMMARY
may see the consensus of opinion shiting in favour of the…
rare variant hypothesis
Rare Variant Hypothesis
The Rare Variant Hypothesis proposes that much of the unexplained genetic variation in complex traits is due to many rare variants, each with larger individual effects, rather than common variants with small effects.
Rare Variant Hypothesis
Why is this important?
GWAS studies have identified many common variants (minor allele frequency >1%), but they explain only a small portion of heritability for most traits — a problem known as “missing heritability.”
The rare variant hypothesis suggests that this missing heritability could be due to rare variants that are:
Not well captured by standard SNP arrays
Specific to families or populations
Functionally important, often in coding or regulatory regions
🧪 Key Characteristics of Rare Variants
Frequency: <0.5% (often much lower)
Effect Size: Medium to large
Detection: Requires whole genome/exome sequencing
Origin: Often recent, can be de novo
Population-specific: Yes — rarely shared across populations
Examples of Rare Variant Contributions
BRCA1/2 mutations in breast cancer — rare but high risk
Lipid metabolism disorders — single-gene rare variants
Autism spectrum disorders — multiple rare, high-impact mutations in neuronal genes
Rare Variant Hypothesis - summary
“The rare variant hypothesis suggests that complex traits and diseases may be driven by many rare genetic variants with moderate-to-large effects. These are not well captured by GWAS, contributing to the problem of missing heritability and requiring sequencing-based approaches for discovery.”