Lecture 3: Linkage Analysis Flashcards
Why do we care about variations?
- Allow tracking ancestral human history
- Underlie phenotypic differences
- Cause inherited diseases
If you are interested in STUDYING A HUMAN DISEASE, HOW do
you FIND out WHICH GENE, when MUTATED, CAUSES THAT DISEASE?
You can find that POSITION by GENETIC MAPPING.
What is the EVIDENCE that a DISEASE OR TRAIT is GENETIC?
- TWIN STUDIES
- FAMILY SEGREGATION
What is the EVIDENCE that a DISEASE OR TRAIT is GENETIC?
twin studies…
MS EXAMPLE
TWIN STUDIES:
COMPARE MONOZYGOTIC AND DIZYGOTIC TWINS
—- COMPARE THE CONCORDANCE RATES OF MZ AND DZ TWINS.
- Monozygotic Twins : Genetically identical
- Dizygotic Twins : Like siblings (1/2 GENOME shared)
***MS Example:
- Monozygotic Twins : Concordance rate 25-30%
- Dizygotic Twins : Concordance rate 2-5%
What is the EVIDENCE that a DISEASE OR TRAIT is GENETIC?
family segregation …
MS EXAMPLE
Family segregation: INCREASED RISK for DISEASE AMONG FAMILY MEMBERS OF AN AFFECTED INDIVIDUAL.
HOW?
COMPARE FREQUENCY OF DISEASE AMONG FIRST DEGREE RELATIVES OF AFFECTED INDIVIDUALS WITH THE FREQUENCY OF THE DISEASE IN THE GENERAL POPULATION.
MS Example:
- Risk to 1st-degree relatives (Parents, siblings, children): 2-5%
- General Population: 0.1-0.2%
What is the goal of a genetic study?
IDENTIFY GENETIC RISK FACTORS (genetic locations) causing human diseases.
Types of Genetic (INHERTIABLE) Disease:
- SINGLE GENE (MENDELIAN) DISORDERS
- COMPLEX DISEASES/COMPLEX GENETIC DISORDERS
Explain single gene mendelian disorders: 3
- rare
- gene segregation within pedigrees (families)
- obvious they are genetic
explain complex disease/complex genetic disorders: 4
- may be NO RECOGNISABLE PATTERN OF INHERITANCE
- likely due to ACTION OF MULTIPLE GENES
- GENES may be INTERACTING WITH EACH OTHER TO RESULT IN DISEASE PHENOTYPES
- ENVIRONMENTAL FACTORS/GENE-ENVIRONMENTAL INTERACTIONS may affect disease
What is a complex disease?
complex disease: phenotypes, symptoms, endotypes
- disease features and nomenclature may change overtime
What are some factors which induce or affect complex diseases: 5
- Environmental drivers
- Epigenetics
- Social ‘determinants of health’
- Genetics
- other HOST factors
Explain/list Environmental Drivers: 6
- chemicals
- smoking
- sun exposure
- infection/infectious diseases
- diet
- alcohol
List/explain Genetics: 4
- molecular mechanisms of disease
- susceptibility (risk) or modifier gene/variants
- genomic and somatic variants
- SNPs, CNV, INdels, mode of inheritance, monogenic or polygenic
Host factors : 9
- time
- age
- microbiome
- obesity
- nutrition
- inflammation
- immune system
- stage of development at exposure
- immune response
Epigenetics: 6
- methylation
- miRNA
- retroviruses
- regulatory events
- gene rearrangements
- gene expression
social determinants of health: 5
- social setting
- communication barriers
- access to healthcare
- cultural and or economic status
- adverse childhood experiences
Explain the MONOGENIC DISORDER PATHWAY (SLIDE 10): 4
- Gene; MUTATION present
- Inheritance pattern (in family members) – DOMINANT OR RECESSIVE
- IMPACT of MUTATIONS in a SINGLE GENE on a DISEASE PHENOTYPE
( genetic risk in population… in percentage) - Genetic risk in different families (in percent)
Explain the COMPLEX DISORDER PATHWAY (SLIDE 10): 4
- GENE VARIATIONS
eg. gene A, gene B, gene C, gene D. - Inheritance pattern is COMPLEX
- IMPACT OF VARIATIONS IN DIFFERENT GENES ON DISEASE PHENOTYPE
- Genetic risk in population (epidemiological evidence..in percent) - Affecting families (in percent).
What are the 3 GENE IDENTIFICATION APPROACHES
- POSITIONAL CLONING
- CANDIDATE GENE APPROACH
- WHOLE GENOME SCREENING
GENE IDENTIFCATION APPROACH: POSITIONAL CLONING
WHAT?
EXAMPLE?
HOW?
- POSITIONAL CLONING (REVERSE GENETICS)
- DISEASE —> (not) FUNCTION <— GENE <— MAP
EXAMPLE: Cystic Fibrosis (CFTR)
HOW: identification of a gene BASED SOLELY on its POSITION IN THE GENOME
MOST WIDESPREAD STRATEGY IN HUMAN GENETICS IN THE LAST 20-35 YEARS
STRENGTHS AND WEAKNESSES OF POSITIONAL CLONING: 4
STRENGTH:
1. No knowledge of gene product required
- very strong track record in single-gene disorders
WEAKNESSES
3. Understanding of FUNCTION not a CERTAIN OUTCOME
- POOR TRACK RECORD WITH MULTIFACTORIAL TRAITS
Gene Identification Approaches
WHAT IS “CANDIDATE GENE APPROACH”: 3
- CANDIDATE GENES are genes LOCATED in a CHROMOSOME REGION SUSPECTED OF BEING INVOLVED IN THE EXPRESSION OF DISEASE TRAITS.
2.Limit to the known biological functions of a particular disease.
- Can be IDENTIFIED BY ASSOCIATION AAND LINKAGE WITH PHENOTYPES
Gene Identification Approaches
WHAT IS “WHOLE GENOME SCREEN APPROACH”: 4
LINKAGE VS ASSOCIATION
- SCAN the WHOLE GENOME WITHOUT ANY PRIOR INFORMATION
- CAN DISCOVER POTENTIAL GENES PLAYING ROLES IN DISEASES
- LINKAGE: 6 or 10k SNPs enough
- ASSOCIATION: 500K OR 1 MILLION SNPSs are recently available
SUMMARY OF GENE IDENTIFICATION APPROACHES
FOR “POSITIONAL CLONING”
- STARTING POINT
- KEY METHOD
- USE CASE
1.STARTING POINT:
genetic marker/location
- KEY METHOD:
linkage analysis, chromosomal mapping - USE CASE:
identifying genes linked to specific chromosomal regions
SUMMARY OF GENE IDENTIFICATION APPROACHES
FOR “CANDIDATE GENE APPROACH”
- STARTING POINT
- KEY METHOD
- USE CASE
- STARTING POINT
suspected gene based on function - KEY METHOD
genetic association studies - USE CASE
investigation specific genes based on prior knowledge
SUMMARY OF GENE IDENTIFICATION APPROACHES
FOR “WHOLE GENOME SCREENING”
- STARTING POINT
- KEY METHOD
- USE CASE
- STARTING POINT
Entire genome - KEY METHOD
GWAS, whole genome sequencing - USE CASE
unbiased discovery of genetic factors for complex traits
Gene Mapping technique: 3
Disease genotype <—> Disease phenotype = BIOLOGY
Disease genotype <—> Marker genotype = PROXIMITY
Marker Genotype <—> Marker Genotype = GENE MAPPING; LINKAGE/ASSOCIATION ANALYSIS
GENE MAPPING TECHNIQUE PRINCIPLE:
“People who have similar phenotypic values (ie DISEASE) should have
higher chance of SHARING OF GENETIC MATERIAL near the genes that influence those traits.”
How to identify genes contributing to disease? = 2
- LINKAGE MAPPING
- LINKAGE DISEQUILIBRIUM (ASSOCIATION STUDY)
Explain LINKAGE MAPPING: 5
- MEASURES THE SEGREGATION OF ALLELES AND A PHENOTYPE WITHIN A FAMILY
- USE CROSSOVER OCCURRING DURING MEIOSIS II
—> 3. Genes that are PHYSICALLY CLOSE TOGETHER ARE MORE LIKELY TO BE CO-INHERITED
—> 4. * Genes that are PHYSICALLY FAR APART ARE LESS LIKELY TO BE CO-INHERITED
- DETECT OVER BOARD CHROMOSOMAL REGIONS ON THE GENOME
EXPLAIN Linkage disequilibrium (Association Study): 2
- EVALUATE the EVIDENCE of a DIRECT CORRELATION BETWEEN A MARKER ALLELE and a DISEASE RISK ALLELE
- SHARING of GENETIC MATERIAL: ACTUAL SHARING OF THE SAME ALLELE (LINKAGE DISEQUILIBRIUM = LD)
What is a GENETIC MARKER?
TYPES?
A genetic marker is a POLYMORPHIC DNA SEQUENCE with a KNOWN LOCATION ON A
CHROMOSOME THAT CAN BE USED TO IDENTIFY INDIVIDUALS that can be used to identify individuals
TYPES:
1. SNP
2. SSR (Short sequence repeat/micro satellite)
SNP:
No of Loci:
Advantage:
Disadvantage:
- NO. OF LOCI:
High, occur ~1 in every 100-300 bp - ADVANTAGE:
…1. Abundant in the genome
…2. Provide high-resolution
mapping
…3. Low mutation rate (stable)
…4. Many technologies available - DISADVANTAGE
Often Bi-allelic (two
possible alleles) which
LIMITS INFORMATION FROM ONE LOCATION
SSR (short sequence repeat/micro satellite):
No of Loci:
Advantage:
Disadvantage
- NO. OF LOCI:
High, occurs ~ every 2-30kb - ADVANTAGE:
Multi-allelic: each SSR can
have multiple alleles - DISADVANTAGE
…1. Lower abundance
…2. more labour-intensive
…3. Higher mutation rate
Meiosis and Recombination: 3
- During MEIOSIS, the CHROMOSOMES
DUPLICATE, then CROSS OVER
(‘rRECOMBINE) to PRODUCE A HAPLOID GAMETE (sperm/ egg) - The gamete DERIVES GENETIC VARIANTS FROM BOTH PARENTS
- MEIOSIS IS THE “BASIS” FOR “HEREDITY”
BIVALENTS IN PROPHASE OF MEIOSIS I:
SINGLE CROSSOVER —> BIVALENTS IN PROPHASE OF MEIOSIS I: “A1 and A2 CROSSOVER” —> CHROMATIDS IN GAMETE: NRRN
DOUBLE CROSSOVER —>BIVALENTS IN PROPHASE OF MEIOSIS I: “A1 AND A2 CROSS OVER, B1 AND B2 CROSSOVER” —> CHROMATIDS IN GAMETE:
1. 2 STRAND: NNNN
2. 3 STRAND: RNRN
3. FOUR STRAND: RRRR
NEED TO LOOK AT SLIDE 18
Understanding Markers and Inheritance = 3
- Polymorphic loci whose locations are known
- Most often SNPs or micro-satellites
- Inherited within the Chromosomes
For linkage analysis, we need informative meiosis
NEED NOT A WANT
UNDERSTAND SLIDE 20 PEDIGREE
Markers and Inheritance
NEED NOT A WANT
UNDERSTAND SLIDE 21 PEDIGREE 1 And 2
What is LINKAGE? = 4
“HOW DO WE MEASURE IT?”
- Only ~1 recombination per chromosome/meiosis
“→ Loci that are close together on the same chromosome tend to be
inherited together (‘linked’ or ‘in LD’ = linkage disequilibrium)” - The closer the loci, the more linkage
“→ Degree of linkage is a measure of genetic distance” - Linkage is measured by the recombination fraction, θ = proportion of
recombinants
“θ = 0 complete linkage
θ = 0.5: no linkage”
- Linkage is measured by the recombination fraction, θ = proportion of
- Two loci on the same chromosome, only a cross-over event will separate them.
What is recombinant fraction? = 5
- θ is a measure of genetic distance
- Further apart two loci, the more likely a crossover event will occur (θ value will increase)
- Centimorgan (cM) is a genetic distance unit
- 1 cM = 1% chance of recombination
- 1 cM approx. 1000 kb = 1 m
Obtaining enough family material to test multiple
meiosis is difficult for rare diseases
- Higher RF (RECOMBINATION FRACTION) between two loci, MORE MEIOSIS needed to OBTAIN EVIDENCE that THEY ARE LINKED.
- Scoring recombinants in human pedigrees not always simple
Linkage mapping: is a marker “linked” to the
disease gene
- Collect families with affected individuals
- Genome Scan - Test markers evenly spaced across the entire genome
(~every 10cM, ~400 markers) - Lod score (“log of the odds”)
WHAT IS LOD SCORE?
what are the odds of observing the family
marker data if the marker is linked to the disease (less recombination than
expected) compared to if the marker is not linked to the disease
Test to estimate whether
the likelihood that TWO LOCI ARE LINKED is greater than
likelihood that THE TWO LOCI BEING UNLINKED
Test to estimate whether
the likelihood that TWO LOCI ARE LINKED is greater than
likelihood that THE TWO LOCI BEING UNLINKED
Z = LOG 10
Z = LOG10 FORMULA
Z = LOG10 * (LIKELIHOOD OF LINKAGE ( THETA <0.5)/ LIKLIHOOD LOCI ARE UNLINKED ( THETA = 0.5))
linkage mapping
- parametric method
- non-parametric method
parametric method
Estimate recombination fraction between a marker locus and an unobserved trait locus
Out of 4 informative meioses,
2 are recombinants => 1/2
slide 26
NON PARAMETRIC METHOD
Count the number of alleles
two affected sibs share
identical by descent (IBD).
“If the marker is linked to the disease locus, the affected sibs will tend to share the disease allele more often than they would at a marker unlinked to the disease locus.”
Statistical significance of Lod Scores:
Z>3.0
EVIDENCE OF LINKAGE
(with 5% chance of error)
2.0<Z<3.0
SUGGESTIVE EVIDENCE OF LINKAGE
-2.0<Z<2.0
UNINFORMATIVE LINKAGE ANALYSIS
<-2.0 –> EXCLUSION OF LINKAGE
Lod Scores: 4
- Human families only produce small numbers of children.
- To get statistically significant evidence for linkage, combine evidence
from many families - A complex mathematical procedure, implemented by computer software, is
used to generate “Lod scores”. - Lod score is a statistic that describes the STRENGTH OF EVIDENCE for linkage,
at any chosen value of the RF, given the family data available.
The task of linkage analysis is to:
find markers that are linked to the hypothetical disease locus
The task of linkage analysis is to find markers that are linked to the hypothetical
disease locus HOW? = 7
- Determine the approximate location of disease predisposing genes
- Linkage is caused by loci (e.g. the risk gene and a genetic marker) being close to each other on a chromosome.
- The recombination fraction is the probability that, in any meiosis,
there will be a recombination between them between two markers.
…….4. θ = 0.5 Mendelian segregation – no linkage, = recombination.
…….5. θ = 0.0 very tight linkage – no recombination.
- Great success in identifying genes for simple Mendelian diseases
- Few successes in identifying genes contributing to complex diseases