Gene Identification Flashcards
General goal of linkage analysis
Identify marker alleles or haplotypes that are co-segregated with a disease phenotype
How linkage is detected
Determining the amount of recombination between two markers
Rarer recombination between two loci are, the closer together they are on a chromosome
Do linked loci segregate independently during meiosis?
No- they are inherited together
Genetic markers definition
DNA variants in a given population or pedigree that are linked to a given disease or trait
Do genetic markers associated with a trait generally cause that trait?
No- they generally are close to the causal genes, thus inherited together with them
Two major types of genetic markers
Variable length polymorphisms and SNPs
Variable length polymorphisms
Type of genetic marker
Characterized by long stretches of repeats
Different individuals have varying lengths of these repeats
Single nucleotide polymorphisms (SNPs)
Type of genetic marker
Characterized by 1 base difference between individuals (2 different allele possibilities at a locus: some individuals carry one, some carry another)
How SNP genotyping works
Used for already discovered SNPs
Different colored probes hybridize to different nucleotides (ex- homozygous A allele is one color, homozygous B allele is a second color, and heterozygous is a third color)
Hybridized DNA is shown on a plate: every SNP shows up as a dot on the plate
Can show up to several million SNPs on one plate
Each plate is one individual: can have multiple plates on a large plate
Genetic distance
How often two markers are separated by recombinations
Physical distance
Related to genetic distance
How many bases there are in between two markers
What “1 centiMorgan” refers to
1% recombination per generation
Is recombination uniformly distributed across the genome?
No- recombination resides primarily in “hot spots” (areas of higher recombination)
Recombination fraction
Designated as Greek letter theta
Measures degree of linkage: fraction of times two markers are separated by recombination in a pedigree
Ranges from 0 to 0.5 (in meiosis, 2 chromosomes stay together half the time)
What recombination fraction value (theta) is associated with linkage of two traits?
Two loci are said to be linked with theta is less than 0.5
What does a recombination fraction value (theta) of 0.5 mean?
Loci are segregating independently- no linkage
Log of the odds (LOD) score
Used to measure statistical significance of linkage
Higher score: linkage more likely
What LOD score is considered to be significant?
Greater than or equal to 3
Weaknesses of linkage analysis
Requires studying relatives (cannot use unrelated individuals)
Works best for rare, dominant traits (not useful for common traits)
Linkage regions are large and finding “the gene” is hard
Do rare variants with large effects (as identified from pedigrees) underlie common genetic diseases?
No
Common disease/common variant hypothesis
There are many variants with modest effects at higher frequency in the population that lead to common disease
Significance- association analysis
Measures whether or not an association between a given SNP and disease is real
Effect size- association analysis
Measures the strength of variant effect
Power- association analysis
Measures the likelihood of finding a real association
What are p values used to measure?
Statistical significance
P value definition
Probability of observing a given association, if there is no real association (null hypothesis)
What p value indicates that an association is statistically significant?
p < 0.05
Genome-wide association studies (GWAS)
Testing millions of SNPs across the human genome to discover which genetic variations are associated with a given trait/disease
Which populations GWAS tests
Population with given trait/disease
Population without given trait/disease
Bonferroni correction: what it does and how it is calculated
Used to decrease number of false positives when testing a large number of samples
Calculated by dividing 0.05 by number of tests
2 ways to quantify effect size
Odds ratio
Relative risk
When is odds ratio used?
Case-control designs
When is relative risk used?
Cohorts and general population samples
Odds ratio formula
a/b divided by c/d a- cases with allele b- controls with allele c- cases without allele d- controls without allele
Relative risk formula
a/(a+b) divided by c/(c+d) a- cases with allele b- controls with allele c- cases without allele d- controls without allele
Interpretation of odds ratio
Used to determine how likely an individual with the risk allele is to have the specific phenotype
Bigger odds ratio: greater likelihood of having disease
Odds ratio of 2 means that person who inherits risk allele is twice as likely to develop disease
Interpretation of relative risk
Relative risk = 1: no difference in risk between experimental and control group
Relative risk <1: event is less likely to occur in experimental group
Relative risk >1: event is more likely to occur in experimental group
How to calculate absolute risk
Multiply population risk by relative risk
What is GWAS useful for finding?
Genetic variations contributing to common complex diseases where many common SNPs are associated with small effect sizes as well as large effect size from more rare SNPs for less common conditions
How it is possible to see the effects of all the SNPs that were not directly genotyped in a GWAS
Not all theoretical haplotypes for a given combination of alleles on a chromosome exist- haplotype of a given individual can be inferred
Linkage disequilibrium
Non-random association of two or more alleles (in GWAS, marker allele and disease-causing allele)
How linkage disequilibrium is measured
Correlation coefficient (r^2 value): value ranges between 0 and 1
0: complete equilibrium (random segregation) of two alleles
1: complete linkage disequilibrium
Do larger or smaller sample sizes yield higher power?
Larger sample sizes
Do larger or smaller effects yield higher power?
Larger effects
Is power higher for SNPs with higher or lower allele frequency?
Higher allele frequency
3 outcomes of low powered studies (i.e. small sample size)
True effects can be missed
Effect estimates can be less precise (even in the wrong direction)
Some “detected” effects can be false positives