How to handle the data from studies of complex disease Flashcards
What does parametric linkage analysis determine?
Genetic determinants of disease.
How are parametric linkage analysis set up?
Ascertain (a small set of) large families (pedigrees) each containing a number of affected individuals
Use a genotyping technique to measure the alleles (genotype) at one or more loci, in as many individuals as are available
Examine the co-segregation (co-transmission) of disease phenotype and alleles at the genetic marker loci
What is genetic distance measured in?
Morgans (M) or centimorgan (cM)
What is the connection between Morgans and recombination?
Recombination between alleles at two loci closely related to physical distance.
What symbol represents the probability of recombination between loci?
θ (Theta)
What are the ranges of θ?
0 to 0.5
What is the value of θ when the loci lies close?
θ is small (≈0) and the loci are said to be completely linked.
What is the value of θ when the loci are further apart?
θ approaches 0.5
Loci are said to be unlinked (alleles at the two loci are transmitted independently)
What is the Likelihood ratio test?
Using a computer program to calculate the likelihood of observed genotype and phenotype data in a set of families.
What does the likelihood ratio depend on?
How well the observations match the assumed model
What is a LOD score?
Testing for linkage using likelihood ratio test.
What does the LOD score test for?
Tests the null hypothesis that the disease locus lies far away from the genotyped marker locus.
What is the null hypothesis in a LOD score test?
θ = 0.5 (unlinked)
How to calculate parametric linkage analysis (likelihood ratio)?
LRmax = L(θˆ) / L(0.5)
What is L(θˆ)?
The value of θ that maximises the likelihood (makes the data ‘most likely’ to have occurred).
How to calculate the LOD score based on the likelihood ratio?
The log base 10 of the likelihood ratio.
What is considered a “Convincing” LOD score as evidence for linkage?
3
Why is 3 a “Convincing” LOD score?
Corresponds to a likelihood ratio of 1000
Data is 1000 times more likely under the alternative hypothesis than under the null hypothesis.
How do we find the max LOD score?
Multipoint analysis.
We calculate the likelihood (or likelihood ratio) at different values of θ.
How is multipoint analysis carried out in theory?
Use a set of marker loci whose genetic map positions are known, and assess the evidence
for the disease locus lying at different positions along the genetic map.
What does the LOD score at each position in a multipoint analysis correspond to?
The likelihood of the data assuming the disease
locus lies at that position divided by the likelihood of the data assuming the disease locus lies far away.
How is multipoint analysis carried out in practice?
Computer program.
What kinds of programmes carry out multipoint analysis?
Merlin (smallish pedigrees, exact calculation)
SIMWALK or MORGAN (larger pedigrees, approximate calculation)
What happens once you have your LOD score graph?
You keep going smaller and smaller till you can pin point.
What happens when a disease is heterogenetic?
Only a proportion (α) of families assumed to show linkage.
What is HLOD score?
When a disease is heterogenetic, α is estimated along with θ by maximum likelihood
How successful have parametric linkage analysis studies been for monogenic disease?
Highly successful
How successful have parametric linkage analysis studies been for complex disease?
Less successful
What is the purpose of non-parametric linkage analysis?
Tries to determine whether members of a family with “similar” trait values tend to share genetic material in common from their common ancestors.
What are the aims of association studies?
Directly examine the association (correlation) between alleles present at a genetic locus and a phenotype of interest.
What is the most popular type of association studies?
Case/control study (unrelated individuals)
How are association studies set up?
Collect sample of affected individuals (cases) and unaffected individuals (controls)
Examine the correlation between alleles present at a genetic locus and presence/absence of disease by comparing the distribution of genotypes in affected individuals with that seen in controls.
Why are parametric linkage analysis more difficult for heterogenic diseases?
Can’t assume all family have the same cause and therefore the same gene locus.
How to test for association (correlation) between genotype and presence/absence of disease when doing case/control studies?
Using standard χ2 test for independence on 2 df.
What is the χ2 test for independence?
(Observed −Expected )^2 / Expected + p value
What is the more sophisticated to preform an association test?
Rearrange your data to test specifically for dominant or recessive effects.
Use linear regression for quantitative outcomes
Use an x variable defined according to genotype
What is the null hypothesis for linear regression of an association test?
Slope = 0
What are FBATs?
Family-based association tests
What are TDT?
The transmission disequilibrium test.
What are LMMs?
Linear mixed models
How to analyse family based data?
Use family-based association tests (FBATs)?
What kinds of family-based association tests (FBATs) are there?
The transmission disequilibrium test (TDT)
Linear mixed models (LMMs)
What kinds of software analyse GWAS?
PLINK, SNPTEST, GCTA
Why are stringiest significance levels required during GWAS?
To overcome the multiple testing problem incurred when we test many SNPs throughout the genome.
What quality control is required when using GWAS?
Discard samples (people) deemed unreliable
Discard data from SNPs deemed unreliable
What could make a sample be deemed unreliable?
Low genotype call rates (unsuccessful genotyping)
Excess heterozygosity (mix of samples)
Gender and Ethincity
What could make a SNPs be deemed unreliable?
On basis of genotype call rates,
Mendelian mis inheritances,
Hardy-Weinberg disequilibrium
Exclude SNPs with low minor allele frequency (MAF), these are hard to compare to the control.