association and linkage Flashcards
identifying human disease genes
locate the genetic variants presumed to be biologically causal for a disease
genetic variants in dna sequences (insertions, deletions, rflps + snps) may have difect impact on disease and phenotypic differences (direct association)
genetic variations in dna sequences may be indirectly associated- allele itself is not involved but a nearby correlated marker effects phenotype
how genetic variation is maintained and studied
mutations/ independent assortment and recombination/crossing over cause variation and evolution
agents of evolutionary change- mutation, non random mating, gene flow, finite population size (genetic drift) + natural selection
genetic variation
can be direct or indirect
direct means that the associated genetic variation is functional, thought to be affecting a biological mechanism and causing the phenotype
indirect associations when the allele itself is not involved but a nearby correlated marker effects the phenotype
allele frequency, populations and gene pools
reproduction and evolution
population= localised group of interbreeding individuals which produce fertile offspring
gene pool is collection of alleles in the population
allele frequency = how common that allele is in a population (allele of intersetes/total no. of copies of allele at that locus in population)
a locus is fixed if all individuals in a ppulation are homozygois for the same allele
common dna polymorphisms
SNPs (1bp)- allele example A/G
repeating elements- STR- 2-13bp
interspersed polymorphisms (insertion + deletions, indels eg Alu)- allele example I/D , +/-
dna polymorphisms are analysed by changes in the nucleotide sequence or size- alleic identity
spectrum of disease allele effects
disease associations are often conceptualised in two dimensions: allele frequency and effect size
highly penetrant alleles for mendelian disorders are extremely rare with large effect size, but most gwas findings are associations of common SNPs with small effect sizes
gene mapping methods
linkage analysis- follows meiotic events through families for co-segregation of disease and particular genetic variants- based on recombinant frequency
associations analysis - detect association between genetic variants and disease across families - based on linkage disequelibrium
linkage studies
aim to identify a marker that co-segregates with the gene of interest so can be used to track gene within a family without actually knowing the mutation
use the inheritence of markers within families to idenify chromosomal regions where disease genes may lie
pedigree analysis
can look for mutations within pedigrees so can see which allele it is linked to
mendels 2nd law
segregation of alleles for one gene occurs independently to that of any pthergene
alleles for different genes are inherently independent of each other
but it is not always accurate and often violated
linkage analysis
key to linkage analysis: smaller the amount of recombinaition observed between genes ie the more tightly linked they are, the closer we could infer that they lie on a chromosome
goal is to place genetic markers along chromosomes, order them and assign genetic map distance
genetic markers are sequences of dna with unknown functions but easily recognisable as landmarks
recombination fraction
recom fraction= recombinants/ total offspring x100
recom fraction θ (theta) between 2 loci is the % of times a recombination occurs between 2 loci
θ is a non linear function of the physical distance separating between the loci on the chromosome
θ (theta) =0 no linkage
θ = <0.5 recombination
2 loci are linked if the RF is less than 0.5, loci are not inherited independently
recombination is isieful as it can be used to build linkage maps, chromosomal maps
features of linkage analysis
must have family data with multiple affected individuals
uses relatively few markers(400-800) for whole genome analysis
successful for mendelian disorders, less so for complex
can find potential disease loci located far from marker
gene mapping
these methods use recombinant frequencies between alleles to determine relative distances between them
recombinant frequencies between genes are proportional to thei distacne apart
gene mapping determines the order of genes and relative distances between them in map units cM= indicates a 1% chance that 2 genes were separated by crossing over
lod score calculation
lod score is a statistical estimate of whether 2 genes or.a disease gene are likely to be located near each other on a chromosome and will be inheroted together
computes values of likelihood function under null and alternative hypotheses
lod scores are the log10 of the ratio between 2 oods
ratio of odds (z)= data linkage/ data no linkage
lod score + 3 indicates linkage
lod score -2 indicated no linkage
link
linkage procedure
decide if linkage analysis is reasonable
collect appropriate families
measure phenotypes and demographic data - family relationships to build pedigree
gentoype markers- at strategic intervals across genome, at locus containing a candidate region
run computer analysis for lod score calculation
lin
linkage results
approximate location of disease gene
placement of disease gene relative to multiple other loci
exclusion of a genome region
many diseases mapped using this- CF,HD, DMD etc
complication factors
- reduced penetrance - not all with risk allele will develop disease
- phenocopies - some without risk allele will develop disease
- gene- gene interaction - homozygosity at another gene may be requirwed
- gene - environment interaction
genotypes
the use of genotype information can be limited
in large sequencing projects, genotypes collected due to cost considerations
genotypes only tell us the alleles at each individual locus, dont know connection of alleles at different loci
haplotype
set of dna variations that are usually inherited together
group of genes, genetic regions or markers within an organism that were inherited together from a single parent - combinations of alleles at different loci which segregate together
is only one set of chromosome rather than entire genetic makeup (genotype)
used for association analysis, can tell association of different loci
deducing haplotype- molecular haplotype, genetic analysis, population inference
molecular haplotyping
begins by isolating sungle molecules or populations of identical molecules of DNA by cloning, molecular biology or physical manipulation
each molecule is then partially or completely sequenced
genetic analysis
infers haplotypes by applying principles of genetic inheritence data in the context of a pedigree
populatiomn inference
assings haplotypes from a database to an individual;s genome and then might infer haplotypes on the homologous chromosomes by exclusion
snps as markers
snps make good markers for haplotype analysis and diseases association
due to LD non random association betwewen alleles at different loci, it is not necessary to sequence and type all these SNPs
within high LD regions, allelic dependence yields redudancy among markers and improves the chances of establishig the approximate location of disease mutation
genetic association studies
studies test for a correlation netween disease status and genetic variation
altered frequency of a SNP allele of haplotype in a series of individuals affected with a disease
SNPs are most widely used test markers
association studies are a major tool for identifying genes conferring susceptibility to complex disorders
association studies
detect association between genetic variants and disease- exploits linkage disequilibrium
wide range of association tests based on family studies have been proposed in genetic studies and they require genotyping from affected individual + their parents
linkage disequilibrium
aka allelic association
refers to the statistical association between pairs of genetic loci, used routinely in localising disease genes, detecting natural selection and studying population history
LD exists when 2 loci are linked and associations between variants exist
when a mutation arises in a population, high LD between the mutation and other variants on the same chromosome may occur
over generations, LD dissipates between mutations and loci far away via recombination
association analysis
LD is the basis of association analysis (AA)
two loci are associated if the alleles at one locus are not independent of the alleles at another locus
for AA we observe a trait and a marker locus (usually not disease specific locus)
test association between marker and trait
null hyp is no association beterrn marker and trait - rejection implies DSL is in LD with the marker
supurious association occurs when 2 loci are not linekd- assoication between 2 loci that are on different chromosomes
types of association
direct- mutant or susceptible polymorphism, allele of interest is involved in phenotype
indirect- allele itself is not involved but a nearby correlated marker changes phenotype
spurious- apparent association not related to genetic aetiolgy
causes of linkage disequilibrium
linkage
mutation
selection
inbreeding
genetic drift
gene flow
GWAS
genome wide association studies
way for scientists to identify inherited genetic variants associated with risk of disease or a particular trait
surveys the entire genome for genetic polymorphisms, typically SNPs, that occur more frequently in cases ie disease than in controls
conducting gwas
data collected
genotyping - using microarrays to capture common variants
quality control
imputation- genotypes can be phased and untyped genotypes imputed
association testing - run for each genetic variant, null hyp made
meta analysis - results from mulptiple smaller cohorts are combined using standardised statistical pipelines
replication - results can be replicated using internal or external replication
post gwas analysed - snp to gene mapping
manhattan plot
shows significance of each variant’s association with a phenotype
each dot represents a SNP, with SNPs ordered on the x axis according to their genomic position, y axis represents strength of their association
qunatile quantile plot showing distribution of expected p values under a null model of no significance vs observed p values
association analysis and haplotypes
association methods based on LD offer a promising approach for detecting genetic variations that are responsible for complex human diseases
individual SPS may lead to significant findings
methods based on haplotypes comprising multiple snps on the same inherited chromosome may provide additonal power for mapping disease genes