Genetic Mapping Flashcards
what is mapping
to map a trait is to ID sequence variation at unique chromosomal location that causes the trait (1 gene disorder) or contributes to the trait (if complex trait ie multi gene disorder)
why map
- improve health and well being of animals (for heritable dx genetic mutation is ultimate cause; treat individual or population at the genetic level to reduce suffering/ death)
- cattle industry increase meat/ milk production
- implications for humans (dogs and humans share ~400dxs)
- in theory if know what mutation is can test for it and breed it out
when to map
mapping lets you test for mutation before phenotype manifests; can design a genetic test for the mutation, enables preventative (pre-breeding) strategies can use marker tests to select best breeders if it is complex trait can’t get rid of problem but can improve it
marker assisted selection
don’t have to wait for phenotype to appear in animal since dx may be late onset lets you pick best breeders
complicating factors of maping
genetic complexity, age of onset, phenocopy, population structure
why is genetic complexity complicating factor of mapping
simple medilian trait easy only has one gene causing one variation but complex trait means variation in phenotype determined at alleles by more than one gene and environment
why is age of onset complicating factor of mapping
- if phenotype appears in young animal then easy to score but if it shows up late in life or in variable expression than it is hard to assign correct phenotype; late onset makes it hard to collect samples form multi-generational families
why is phenocopy complicating factor of mapping
phenocopy is when environment causes mimic of a phenotype that is usually determined genetically making it hard to correctly assign phenotype (ex smoking -> cancer)
why is population structure a complicating factor for maping
trait that is genetically complex in large diverse population may behave simpler in smaller more isolated population
easiest traits to map
simple and selected; cancer and hip dysplasia are complex unselected traits and are v hard to map where something simple like nacrcolepsy is essentially removed from population
phenotype
trait, inherited component that we see
morphology
what we see, floppy ears, coat color ect
disease
what happens secondarily (cancer status)
beahavior
ex loves running in snow this is hard to measure
what do you need to map a trait
- population with variable heritable trait
- Access to samples from population (DNA, Phenotype, Pedigree info, tissues for later study of gene expression with mRNA)
- Genetic map or sequenced genome from member of that species
- Set of known sequence variations within genome of that species (position in genome, means of genotype variation, usually these variations SNPs)
- Money
types of traits
eristic, factor, continuous, discrete/ bindary
Meristic (cardinal)
discrete count of something (number of kiwi eggs)
factor
set of discrete categories we assign (colors)
continuous (metric)
takes any value within a range (height of corn plant)
discrete/ binary
two phenotype categories (presence or absence of something)
“Size” traits
didfficult to capture two dogs can weight same with v diff bodies and heights
principal component analysis
reduce many measurements into independent non-correlated components
subjective traits
such as body condition score (two individuals score same dog -> inter-observer relialblity
4 ways to map
candidate gene screen, linkage analysis, genome wide association studies, whole genome sequencing
candidate gene screen
cheap, fast, easy but high risk of failure bc if you sequence wrong gene this fails
- collect sequence variation from single gene and test for association with trait; this requires a prior hypothesis, if same dx mapped in related species or biology of gene and dx is well known ; even if find an association btwn a tested gene and trait there is still possibility that other genes also associate with the trait
linkage analysis
expensive, slow and labors; hard to collect family based samples; moderate risk failure, powerful when it can be done; linkage is not close to underlying gene (may not be mutation)
- requires no prior hypothesis
- samples taken from families segregating for trait
- genotypes collected and analyzed to detect linkage btwn marker and trait (linkage= tendency of alleles or states to be inherited together)
genome wide association study
not cheap, faster than linkage, hard, moderate risk of failure, current favorite method, get closer to mutation
- no prior hypothesis required
- samples collected from population segregating for trait
- uses population sample in conceptually same way that linkage mapping uses a family (population is just a v extended family)
whole genome sequencing
this is likely way of the future as it is becoming less expensive
- conceptually v similar to GWAS but the coverage (marker density) of genome is much better
- look through every chromosome at every variant
- do WGS of one or a couple cases and one or a couple controls and compare variants to find those unique to the cases (look for what is conserved and what is changed)
Candidate gene screen
- collect DNA samples and phenotypes from not close relatives in population
- design PCR spanning exons of candidate gene
- sequence each exon in case and control sample and look at sequence variations
- test for statistical association btwn variant genotype and phenotype
- detect linkage over megabsae regions of chromosomes so only need ~100s of makers (difficult to ID causal mutation)
further two makers are ___ of recombination
higher chance
markers near mutation are
linked to it so they are likely to be inherited together
recombination over time
shuffles alleles found on same chormosome; alleles in tight linkage are least likely to be shuffled
linkage mapping
uses recombination over several generations; uses hundreds of markers, find large region of linkage
association mapping
uses recombination over ~20 to ~100 generations uses thousands of markers, finds small regions of association
examples of linkage mapping in dogs
PRA, PRCD (progressive rod-cone degeneration; linkage analysis btwn each pair of markers and btwn PRCD and each marker) Copper toxicosis, Renal Cancer, Narcolepsy
Genome wide association mapping vs linkage analysis
-GWAS maps much smaller region of association than linkage does bc only marker alleles v close to mutation remain unshuffled in many generations of recombiatnio; GWAS requires ~100s-1000x more markers than linkage analysis to cover same genome
genotyping on comercial SNP assay
- extract DNA
- DNA amplified and fragmented
- Array with allele-specific oligo probes attached to beads (unique oligo for each bead)
- Add fragmented DNA (hybridization)
- Single base extension with labeled nucleotides (read in scanner, lables= colors) (intensity of binding gives us genotype)
How to do genome wide association study
- Sample the population segregating for the trait and assign each dog as case or control (affected or unaffected)
- Genotype each dog at 173k SNPs (commercial array) that span genome
- Statistically test for association between marker state and trait value (get maker and then go look at genome around it and look for possible causes of the variation you are looking at; marker can be in gene or close to it)
GWAS if phenotype data isn’t binary
use linear regression; use alleles or genotypes not T test; if do this over and over have to adjust random association (genome wide p value)
genome-wide significance threshold
as more tests are done higher chance of false positive (identifying at least one singnificiant result due to chance) to handle this bonferroni correction, alpha=0.05
calculate:
0.05/ (number of makers in analysis; ie the number of independent tests done)
this will give you line that is cut off for where significance is
a lot of complex traits we study caused by mutations
in the coding regions but also most are in the non-coding region (introns/ regulatory region)
candidate genes
need to see if these have mutations and if they could be causing the phenotype in question?
GWAS vs whole genome sequnecing
- use WGS to find variants for use following VWAS; use haplotypes to accurately estimate variants that aren’t included on genotyping array
- w/ GWAS we genotype an SNP that was discovered for a certain population which has ascertainment bias (ie if discovered SNPs by looking at throughbred chromosomes then SNPs will be good at categorizing these but not as good for other horse breeds; if we sequence all our samples we will discover all the sequence variations)
WGS example
storage dx in cat; cat was sequenced at 30x depth; predicted missense mutation identified in NPC1 caused by adenine to cytosine transvehrsion; cat was homozygous for variant and no other cats in dataset had variant; autosomal recessive neuroviseral lysosomal storage disorder that results in defective intracellular transport of cholestorel in humans effects same gene?
linkage mapping has been replaced by
GWAS mapping which uses population-based samples and offers high resolution mapping
in future
individuals will be sequenced rather than just genotyped
mapping still limited by
- access to samples, pedigree, and phenotypes in one animal at same time
- population hx and size may limit mapping power
- mapping complex traits will require excellent phenotypic