week 10 - genetic analysis Flashcards

1
Q

Genetic Variation Across Species

chatgpt

A

Genetic variation exists in all species and includes SNPs, indels, CNVs, structural variants, and novel mutations. These variations can be common or rare, and their frequency and effects can vary by species and population.

Plants: often show high structural variation (e.g. polyploidy, TE activity)

Animals: domesticated species have artificial selection-driven variation

Humans: millions of SNPs, rare variants, and de novo mutations per individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Understanding Complex Traits

chatgpt

A

Complex traits:

Controlled by many genes (polygenic)

Influenced by environment and gene-environment interactions

Show continuous variation (e.g. height, yield)

Often have non-Mendelian inheritance and no clear genotype-phenotype map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

QTL Mapping Approaches

A

linkage analysis

GWAS

pedigree analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

QTL Mapping Approaches
linkage analysis

A

Population Type: Controlled crosses or pedigrees
Resolution: Low (Mb–cM)
Pros: Powerful for rare variants, family-based
Cons: Low resolution, limited by recombination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

QTL Mapping Approaches
GWAS

A

Population Type: Natural populations
Resolution: High (kb–100 kb)
Pros: High resolution, no prior knowledge needed
Cons: Sensitive to confounding, missing heritability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

QTL Mapping Approaches
Pedigree Analysis

A

Population Type: Human/animal pedigrees
Resolution: Medium
Pros: Tracks inheritance of traits
Cons: Relies on known family structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Design and Challenges of GWAS

chatgpt

A

GWAS design principles include using large, diverse, well-matched populations; high-density SNP genotyping; and statistical thresholds to detect associations.
Challenges:

Missing heritability

Linkage disequilibrium confounds

Rare variant detection

Causal variant identification

Population stratification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

From QTL to Causal Variant

chatgpt

A

Moving from QTL to causal variants involves:

Fine-mapping within associated regions (e.g., higher resolution studies)

Functional validation via expression data, reporter assays, or gene editing

Integration with other data (e.g., eQTLs, chromatin state, transcriptomics) It’s difficult because QTL peaks often span many genes, and some signals lie in non-coding or regulatory regions far from known genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Genetic variation and polymorphism

A
  • Variation: the existence if two or more forms (alleles) of a section of DNA
  • If a variation occur with frequency >0.5% then it is a polymorphism
  • Genetic variation could lead to observable effects, but the majority do not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Genetic variation and polymorphism
example

A
  • Example of an Arabidopsis dwarf mutant that has a SNP which converts a normal looking plant – like this one – to a dwarf plant.
    o This is a simple case where there is a 1:1 mapping between genotype and phenotype.
    o The dwarf phenotype is caused by mutation in a single gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Classes of genetic variants:
Single Nucleotide Variant (SNV/SNP)

A

Change of one base (A→G, T→C); most common variant; may be silent or impactful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classes of genetic variants:
- Insertion-deletion variant

A

o INDEls occur where one or more bases are present in some genome and absent in others.
o Generally only a few bases long but can be up to 80kb in length!

Addition or loss of one or more bases; can shift reading frames in coding regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classes of genetic variants:
- Block substitution

A

o a string of adjacent nucleotides varies between individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Classes of genetic variants:
- Inversion variant

A

o the order of the bases is reversed in a defined section of the genome.

A segment of DNA is reversed within the chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classes of genetic variants:
- Copy number variant

A

Large DNA segments are duplicated or deleted; affect gene dosage

o identical or nearly identical sequences are repeated in some genomes and not others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Genetic variation and polymorphism
frequency?

A
  • Human genetic variations are referred to as either COMMON or RARE, to denote the frequency of the minor allele – the less frequent allele in the population
    o Rare variants are population-specific
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Genetic variation and polymorphism
common variant

A
  • Common variants have minor allele frequency (MAF)
    o >1%
    o E.g. a C/T SNP with 5% frequency of the T allele
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Genetic variation and polymorphism
rare variant

A
  • Rare variants have minor allele frequency
    o <0.5%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Genetic variation and polymorphism
- Novel/de novo variant

A
  • Novel/de novo variants occur only in a single family/individual
    o E.g. a variant that we do not share with our parent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP)

A
  • Single base pair substitutions
  • Arise through mistakes in DNA replication or caused by mutagens
    o E.g. mutation rate in Arabidopsis is 7x10^-9 base substitutions per site per generation
  • Biallelic – 2 alleles (in diploids)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
frequency

A
  • Minor allele frequency can range from <1% to 50%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
methods for detecting

A
  • Many methods for detecting SNPs
    o SNP microarrays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Genetic variation and polymorphism:
Single nucleotide polymorphism (SNP):
common?

A
  • SNPs are the most common
    o Which is why they are used a lot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Deletions, duplications and insertions

A
  • Expand or contract the length of non-repetitive DNA
  • Small deletions and duplications arise by unequal crossing over
  • Small insertions can arise through the activity of transposable elements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Deletions, duplications and insertions
types

A

deletion
novel sequence insertion
mobile element insertion
tandem duplication
interspersed duplication
inversion
translocation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Human genetic variation

A
  • 4-5 million differences between any 2 humans
    o 1 in 1000 bases
  • Most differences occur at common locations
    o 4-5 million common SNPs (>0.5%)
    o 50K rare mutations (<0.5%)
    o 40-80 de novo mutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Human genetic variation
QTL mapping

A

So what we will attempt to do with QTL mapping is to relate these differences to differences in the phenotype of a trait of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

PHYSICAL VS GENETIC MAP
physical map

A
  • Physical distance in nucleotide bases (kb)
  • The actual distance in bp between two variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

PHYSICAL VS GENETIC MAP
genetic map

A
  • RF between two markers
  • Based on the number of recombination events occurring in a region
  • RF is relation to genetic distance in cM via a mapping function
  • Physical distance is usually correlated with genetic distance
  • Markers are closer together in regions with low RF
  • Markers are further apart in regions of high RF
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

PHYSICAL VS GENETIC MAP
genetic map: accurate?

A
  • Not always accurate
    o Hotspots
    o Regions where you would expect to find more recombination that others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

PHYSICAL VS GENETIC MAP
why can Two SNPs be physically close together but genetically far apart

A
  • Reason:
    o Because there might be no recombination between them
    o So genetically look far away from each other (even though they are physically close)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

PHYSICAL VS GENETIC MAP
Principles of genetic mapping

A
  • Find regions of the genome that are variable (markers)
  • Map and order these regions to produce a genetic marker map
  • Map traits of interest to these markers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

SUMMARY

A
  • There are many types of DNA sequence variation in populations of plant, animal or microbial species
  • By assembling a map of variation we can link this map to variation in phenotypic traits of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

QUANTITATIVE GENETICS
how is variation divided?

A
  • Variation in humans, plants and animals is broadly divided into 2 types:

qualitative
quantitate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

QUANTITATIVE GENETICS
Qualitative

A

 Blood groups, eye colour, flower colour
 Only a few genotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

QUANTITATIVE GENETICS
Quantitative

A

 Height, weight
 Many genotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Quantitative or biometrical data

A
  • Deals with the study of inheritance of the quantitatively varying characters (complex or quantitative traits) that are controlled by many genes and also to a considerable extent by the environment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

example so quantitative traits
Plants
morphology

A
  • Yield
  • Quality
  • Maturity
  • Size (height, girth, biomass)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

example so quantitative traits
Plants
physiology

A
  • Abiotic stress responses (e.g. drought tolerance)
  • Biotic stress responses (e.g. disease resistance, photosynthetic capability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

example so quantitative traits
animals
morphology

A
  • Size (e.g. weight and height)
  • Productivity (e.g. milk and egg production)
  • Quality (e.g. meat/wool)
  • Fecundity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

example so quantitative traits
animals
physiology

A
  • Growth rate
  • Abiotic stress responses (e.g. heat tolerance)
  • Biotic stress responses (e.g. disease resistance)
  • Strength
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

example so quantitative traits
animals
behaviour

A
  • Intelligence
  • personality
43
Q

example so quantitative traits
humans
morphology

A
  • size (weight, height, ect)
  • colour
44
Q

example so quantitative traits
humans
physiology

A
  • metabolic rates
  • diabetes
  • hypertension
45
Q

example so quantitative traits
humans
behaviour

A
  • intelligence
  • personality
46
Q

Threshold traits

A
  • trait which has complex/polygenic inheritance, but only two obvious phenotypes
    o e.g. affected or not affected by disease
  • E.g. type II diabetes
    o Individuals who exceed a certain number of risk factors (genetic and/or environmental) will develop the disease and others will not.
47
Q

Central dogma of molecular biology

A
  • Flow of information
    DNA –> RNA –> protein –> produces trait
  • Goal in genetic is to relate genetic variation to phenotypic variation in a trait
48
Q

Genetic basis of a quantitative trait
debate

A
  • Debate between mendelian and biometrician
    Willian Ratson
  • Quantitative traits do not follow discreet patterns and cannot be inherited
    Francis Galton
  • Quantitative traits can be inhertited and the degree of inheritance can be estimated by pure stats
49
Q

Genetic basis of a quantitative trait
debate: ronald fisher

A

Ronald Fisher came up with the polygenic model
- To reconcile these two opposing views on quantitative traits

50
Q

Genetic basis of a quantitative trait
the polygenic model

A
  • Fisher first highlighted the polygenic nature of quantitative trait
    o The random sampling of alleles at each gene produces a continuous normally distributed phenotype in the population
51
Q

Genetic basis of a quantitative trait
the polygenic model:
- Quantitative traits are mostly controlled by …

A

several genes

o Polygenes or QLT (quantitative trait loci)

52
Q

Genetic basis of a quantitative trait
the polygenic model:
each genes behaves like…

A

a mendelian gene

o Each gene can segregate independently

53
Q

Genetic basis of a quantitative trait
the polygenic model:
- There are effects arising from…

A

from environmental variance

54
Q

Genetic basis of a quantitative trait
the polygenic model:
interactions

A
  • There is also interaction within each gene (dominance and co-dominance)
  • And interaction between genes (linkage and epistasis)
  • And interaction between the gene and the environment
55
Q

Modes of gene action (Interaction between alleles at a locus)
additive effects

A

o Measure the quasi-independent effects of alleles on a trait

56
Q

Modes of gene action (Interaction between alleles at a locus)
dominance effects

A

o Measure the interactions between alleles at a single locus
o E.g. complete dominance (one allele can mask the effect of the other)

57
Q

Modes of gene action (Interaction between alleles at a locus):
Alleles can interact with each other in a number of different ways to produce…

A

produce variability of the phenotype.

58
Q

Modes of gene action (Interaction between alleles at a locus):
additive gene action

A

When the heterozygote phenotypic value is half way between that of the two homozygotes, gene action is defined as additive.
o Can see in the graph that each A2 allele contributes an increase of i to the phenotype value, in this case +1.

59
Q

Modes of gene action (Interaction between alleles at a locus):
complete dominance

A

2) Complete dominance – the phenotype is the same whether you have 1 or 2 A2 alleles.
o Can see in the graph how there is an underlying additive genetic component as shown by the slope of the line – but the values deviate from additivity due to dominance effects. In this case we can see that A1A2 and A2A2 phenotypes are quite similar.

60
Q

Modes of gene action (Interaction between alleles at a locus):
within dominance:
- Complete dominance

A

o e.g. Mendels crosses between pea plants with purple flowers or white flowers – all progeny of F1 are purple as Purple P allele is dominant.

61
Q

Modes of gene action (Interaction between alleles at a locus):
within dominance: incomplete dominance

A

o heterozygote value is over half way but not quite as high as the A2 homozygote.
o E.g. snapdragons – cross red to white and see pink as neither allele is dominant – a blending of the phenotypes

62
Q

Modes of gene action (Interaction between alleles at a locus):
within dominance: overdominance

A

o rare- phenotype of the heterozygote is beyond the range of either homozygote.

63
Q

Modes of gene action (Interaction between alleles at a locus):
within dominance heterozygote advantage

A
  • Or the “heterozygote advantage” where the heterozygote has better fitness than either of the homozygotes.
    o Sickle cell anaemia – where the heterozygote has partial resistance to malaria.
64
Q

Will genetic architecture be the same in all populations?

A
  • Genetic architecture will differ in different populations.
  • Different populations will have different environmental exposures and different alleles segregating.
65
Q

Challenges for studying quantitative traits

A
  • Genotype not identifiable from phenotype
  • Epistasis (gene-gene interaction)
  • Genotype x environment interaction (G x E)
66
Q

Contribution of QLT alleles to a complex trait

A
  • If we could see which individuals had which genes at QTL locus B
  • The distributions for the 3 genotypes overlap so we cannot determine genotype just by looking at phenotype.
  • An individual with avergae height could be either of the 3 genotypes.
  • Let’s say height = 30cm – could be any of the three genotypes!
67
Q

Genotype x environment interaction

A
  • Different genotypes responding in different ways to changes in the environment

e.g.
a. Trait is not sensitive to environment

b. Trait value higher in environment 2

c. Some genotypes have higher trait value in environment 2, others have lower trait value in environment 2.

68
Q

Basic elements of mapping QLT

A
  • Phenotype of the trait
  • Marker genotype
  • Genetic structure of mapping populations
69
Q

Basic elements of mapping QLT
statistical methods

A
  • Use statistical methods to bring together the 3 sources of data and identify regions of the chromosome, which we call QTL, that are associated with variation in the trait.
70
Q

SUMMARY
Complex traits are influenced by

A

several genes and by the environment
o Rare combination of polygenes effect can lead to unexpected phenotypes

71
Q

SUMMARY
- Since genotype cannot be determined by phenotype, we cannot use..

A

Mendelian phenotype ratios to work out what is going on and will need new methods of modelling these traits

72
Q

GWAS design and principles
Ways to genetically dissect complex traits

A

1) linkage analysis

2) genome wide association analysis

73
Q

GWAS design and principles
Ways to genetically dissect complex traits:
linkage analysis

A

o Uses defined population either created using crosses or with familial relationships known (pedigrees)

74
Q

GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis

A

o Uses naturally occurring populations of individuals
o Used in human, plant and animal populations

75
Q

GWAS design and principles
Ways to genetically dissect complex traits:
linkage analysis
how

A

use a segregating population

large recombinant blocks (recent recombination)

F2 mosaic
- due to recombination

makes use of recombination to define where gene of interest might be

76
Q

GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
how

A

take natural population as they are and use for analysis

make use of historic recombination over thousands of years
- lots of small recombination blocks
- narrows down region of interest to a smaller interval

can find commonalities between genotype and phenotype
- looking for shared phenotypes between individuals

77
Q

GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
associations

A
  • Association – e.g. human populations are the results of many generation of recombination in meiosis, producing genomes with short blocks from ancestral individuals.
78
Q

GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
statistical methods

A
  • Collected genotype and phenotype data as shown in the table.
  • Use statistical methods to generate graphs like these in which we have the locations of the markers on the genetic map (x-axis) and on the y-axis, the evidence for a QTL.
  • As with all statistical tests we define a threshold for deciding whether or not a result is significant (link to minitab workshops).
  • Can see a broad peak. This is a QTL.
  • The results of association analysis are presented in a slightly different way. Individual dots = markers.
  • See narrow band of markers with significant evidence for QTL.
  • Note the band is narrower than with linkage analysis.
79
Q

GWAS design and principles
Ways to genetically dissect complex traits:
genome wide association analysis
next step

A
  • So then the next step is to look at which genes are in the region of the significant markers and perform experiments to confirm that these genetic markers really affect the trait.
80
Q

Advantages of GWAS approach

A
  • Previous knowledge of the genetic of the trait not required
  • Can fine map QTL to 10-100kb because many recombination events have occurred in the history of the population
  • Can reveal causal genes in an unbiased way
81
Q

GWAS design in humans

A
  • A population that segregates for a trait of interest
  • Mostly case-control groups
    o Case: individuals with the traits (e.g. patients with disease, personality etc.)
     Have the things you are trying to find the genetic basis of
    o Controls: individuals without the traits (healthy individual)
    o Want case and control groups to be similar to each other (avoids confounding factors)
     Want a mix of population in both groups
  • Genotype data
    o Mostly SNP arrays that genotype a pre-selected subset of SNPs
    o Whole genome or exome sequencing
  • Statistical analysis to pick up association signal
82
Q

GWAS design in humans
Population of interest: case-control group

A
  • Compares prevalence of polymorphism between subjects who have that condition (cases) with patients who do not have the condition (control)
  • In theory, the case-control study can be described simply.
    o First, identify the cases (a group known to have the outcome) and the controls (a group known to be free of the outcome).
    o Then, look back in time to learn which subjects in each group had the exposure(s), comparing the frequency of the exposure in the case group to the control group.
83
Q

Statistics of GWAS studies

A
  • Null hypothesis: there is no association between the marker (e.g. SNP) genotype and the trait
  • Alternative hypothesis: there is association between the marker (e.g. SNP) genotype and the trait
  • For categorical traits
    o Can use statistics similar to chi-square to compare the frequency of genetic variants in the two groups (case/control)
84
Q

Statistics of GWAS studies
- For continuous/quantitative traits

A

o Can use parametric analyses like ANOVA and regression to compare normally distributed traits between the genotype group

85
Q

Statistics of GWAS studies
GWAS tests for…

A

association of thousands to millions of markers (polymorphism,, e.g. SNP, individual points) with trait status in hundreds to ten of thousands of individuals

86
Q

Statistics of GWAS studies
manhattan plot

A
  • Looking for the peaks that stand out
    o Tells us that there is a specific association
  • Reach dot represents a variant (a SNP)
    o Have many SNP spread across the chromosome
  • Significant markers (red points) with statistical support (log(p)) above the statistical threshold (red line) are causal of the trait variation or are in LD with the unknown causes variations
87
Q

Linkage disequilibrium and GWAS

A
  • Linkage disequilibrium (LD) refer to correlation between SNPs
  • If equilibrium would be a random association

look at notes for table

88
Q

GWAS performed to date represent the tip of the iceberg

A
  • Many associations are still hidden and can be uncovered by:
    o Studying a larger sample population (> 1 million)
    o Studying more diverse populations (non-European populations)
    o Incorporating gene x gene and gene x environment interaction
    o Considering non-additive models of gene actions
89
Q

So what is the benefit to all this?
Impact of GWAS findings in medicine

A
  • GWAS have revealed “molecular sub-phenotypes” or ‘heterogenetiy’ of disease
  • Increased understanding of disease pathways will provide new drug targets and promote personalised medicine
    o i.e. target treatment to underlying subtype
  • e.g. Can be many genes responsible for e.g. cancer
    o These can differ between individuals
    o So understanding which genetic pathway in a particular individual will allow for targeted drug treatment
90
Q

GWAS
Difficulty in finding causal variant

A
  • The associated SNP may be within or close to a gene that Is relevant to the trait of interest
  • Goal is to identify this gene and its variants (alleles) that conder different disease risks:
    o Quantitative trait nucleotide
  • Some associations are nowhere near a functional gene
    o E.g. a variant on chr9p21 associated with heart attack is 150kb from the nearest gene
91
Q

GWAS
Much trait variation remains unexplained

A
  • The bulk of the genetic variance underlying the trait heritability has still not been explained
  • E.g. > 30 markers associated with Crohn’s disease explains < 10% of genetic variance
  • The so-called “missing heritability” problem
92
Q

GWAS
Pitfall and criticism of GWAS

A
  • Not very useful for very rare disease
    o Small amount of population
    o Statistics do not really work on small populations
  • Disease prediction
  • True signals
  • Population stratification
  • Ultra-rare mutations
  • Epistasis
  • Cause variants or gene
  • Missing heritability
93
Q

GWAS
good for

A
  • Identification of novel SNV-trait associations
  • Discovery of novel biological mechanisms
  • Diverse clinical applications
  • Insight into ethnic variation of complex traits
  • Relevant to low frequency rare variants
  • Identification of novel monogenic and oligogenic disease genes
  • Relevant to the study of structural variation
  • Multiple applications beyond gene identification
  • Straightforward GWAS generation management and analysis
  • Easy to share and publicly available data
94
Q

GWAS SUMMARY
- Genetic architecture of complex traits is…

A

Complex
o Highly polygenic

95
Q

GWAS SUMMARY
- GWAS is a common way of…

A

mapping complex traits in natural populations

96
Q

GWAS SUMMARY
- GWAS can be…

A

direct or indirect
o Typical genetic effect sizes are small
o GWAS has helped in understanding disease pathways but only account for small proportion of variation

97
Q

GWAS SUMMARY
- The major challenge will be to…

A

discover the mechanisms for how specific genetic variants contribute to disease risk

98
Q

GWAS SUMMARY
may see the consensus of opinion shiting in favour of the…

A

rare variant hypothesis

99
Q

Rare Variant Hypothesis

A

The Rare Variant Hypothesis proposes that much of the unexplained genetic variation in complex traits is due to many rare variants, each with larger individual effects, rather than common variants with small effects.

100
Q

Rare Variant Hypothesis
Why is this important?

A

GWAS studies have identified many common variants (minor allele frequency >1%), but they explain only a small portion of heritability for most traits — a problem known as “missing heritability.”

The rare variant hypothesis suggests that this missing heritability could be due to rare variants that are:

Not well captured by standard SNP arrays

Specific to families or populations

Functionally important, often in coding or regulatory regions

101
Q

🧪 Key Characteristics of Rare Variants

A

Frequency: <0.5% (often much lower)
Effect Size: Medium to large
Detection: Requires whole genome/exome sequencing
Origin: Often recent, can be de novo
Population-specific: Yes — rarely shared across populations

102
Q

Examples of Rare Variant Contributions

A

BRCA1/2 mutations in breast cancer — rare but high risk

Lipid metabolism disorders — single-gene rare variants

Autism spectrum disorders — multiple rare, high-impact mutations in neuronal genes

103
Q

Rare Variant Hypothesis - summary

A

“The rare variant hypothesis suggests that complex traits and diseases may be driven by many rare genetic variants with moderate-to-large effects. These are not well captured by GWAS, contributing to the problem of missing heritability and requiring sequencing-based approaches for discovery.”