Lecture 7 - Exploiting natural and induced genetic variation in crops part 2 Flashcards
Give an example of a technique used to identify genes for molecular markers
- Quantitative trait mapping - can be used to track the loci controlling trait variation
- Genome wide association scan (GWAS)
Compare GWAS to QTL mapping
- Both use bi-parental crosses - particularly good for additive effects
- Additive effects: multiple regions of the genome affecting a particular trait
- If the effect is additive the effect of different loci add to each other
- Some ability to analyse epistatic effects but this has complex statistical analysis
What are the features of GWAS?
- Do not require development of DH/RI mapping populations
- Exploits linkage disequilibrium analysis
- Exploits historial recombination to identify molecular markers tightly linked to loci controlling traits - could be up to millions of years worth of recombination within the germplasm
- Large panels with broad genetic diversity exhibit very short linkage disequilibrium
- Requires at least an approximation of the marker order in the genome of the species being studied so genes can be used as a source of MM
- Requires very high marker desities
- Requires correction for population structure to reduce false positive associations
- Candidate genes in associated genomic regions need to be analysed further to identify the causative gene
Why do GWAS not require the development of DH/RI mapping populations
crop varieties are ideal and one genotyped panel can be analysed for many traits
What is linkage disequilibrium analysis?
The tendancy for coincidence of marker alleles with trait values
What are GWAS looking for?
A correlation between marker alleles and the position of the gene controlling the trait sitting at a QTL locus as one of the genes controlling the trait
Why are GWAS’ particularly powerful?
Not limited by the small number of generation that new RI mapping popuations will have been able to be put through.
Why is it beneficical in GWAS that large panels with broad genetic diversity exhibit very short linkage disequilibrium?
Tend to have a high inheritance of an allele of a marker with the gene that is controlling the trait
But
Need to have large numbers of molecular markers to have a marker close enough to the gene that controls the trait
How can GWAS’s be short cut?
Need a diversity panel over which score the genotype, trait data, plant height, seed yield - any aspect of composition.
Marker data is scored across that panel
The more markers th better
Do an analysis based on the markers that alllow statistical corrections to be made based on the population structure
Why is genetic variation inportant”?
Contributes to the productivity and quality of crops
What makes up the genetic variation of a crop?
- gene sequence variation
- gene expression variation
How is a mixed linear association model generated?
- Trait scores from phenotype data
- Marker allele scores from marker data
- Population clustering, pairwise relatedness estimates from population structure and relatedness data
What does a mixed linear associated model allow the identification of?
Looking at gene sequence variation and gene expression variation allow an identification of sequence polymorphisms in the transcriptome
How can marker allele scores be replaced in mixed linear assocaition model?
Marker allele scores from marker data can be replaced by transcriptome SNPs and transcriptome quantification from marker data
What is the genomics platform for association genetics?
Use genome sequence scaffolds of ancestral species (B. rapa and B. oleracea)
Rearrange based on transcriptome SNP-based linkage mapping
Map crop unigenes based on sequence similarity
To determine the gene order in crop genome and any polymorphsisms
For b. napus: 21 000 ish SNPs map 9000 transcript assemblies, 126 000 transcript assemblies anchored to the genome
How do you use functional genomics for association mapping?
Start with a genetic diversity panel and make mRNA from the panel
Sequence (usually done with illumini platform)
Analyse computationally using genome resources identified in transcriptome to produce functional genotypes
Score functional genotypes for gene sequence based markers e.g. SNPs or for gene expression based markers e.g. GEMs
Quantify transcripts
Identify how many sequence tags are present for each of the genes in the genome and find the correlation between these and the trait data scored across the diversity panel with statistics that account for the population structure
Visualise outcome
What is acheived by using functional genotypes for association mapping?
Get the genome order of markers and the correlation of markers within alleles or particular expression levels
How can marker trait associations be visualised?
Manhattan plots
Distance up the Y axis: Significance of association (P) (log scale)
Distance across the X axis: hypothetical position in the genome
What does a manhattan plot show?
Shows data from GWAS
The significance of the association of a particular marker in its hypothetical position in a genome
What did the linkage disequilibrium analysis for erucic acid content of seed oil (B. napus) show?
- validation of manhattan plot
- 53 B. napus associations, used transcriptome SNP markers
- Genome wide association scan using 63 000 transcriptome SNP markers
- Found clusters of high significance associations that coincided precisely with the positions of the two genes known to control the erucic acid content of seeds (association peaks coincide)
- Erucic acid not required in edible oil (breeding target to reduce) but required industrially (breeding target to increase)
What did the association genetics study using transcriptome SNP markers show for seed glucosinate content (B. napus)?
- Lucosinate accumulates in the meal and a high content has an adverse effect on the quality of meal for feed
- genome-wide association scan using around 63 000 trascriptome SNP markers
- also a genome wide regression analysis for expression polymorphism markers in around 115 000 informative hypothetically ordered transcript assemblies
- Coincinding SNP and GEM associations provided further evidence for the importance of the three regions identified by QTL analysis for controlling trait variation
- Superimposition of SNP and GEM associatons allows be able to zoom in on small part of the chromosome and high significance assocaitions corresponding between SNP markers and GEM define small intervals where the canditate genes can be found
- Identified negative correlation, lower expression of these genes led to lower accumulation of glucosinase (indicating deletion in a part of the genome)
- Identified several loci in the genome, two regions show detetion profile
- Overlap by the ortholog in a single gene in arabidopsis called HAG1 which is a TF controlling glucosinolate synthesis in arabidopdid
What is the sixe of diversity panels?
400-600 plants
In full sized, even in b. napus (relatively short of diversity) still find 100-1000 SNP markers
What are GEM associations?
low expression correlated with low trait value
interpret as segmental deletion of homeologous exchange
What was idetified by superimposing SNP and GEM associations to identify traits controlling glucosinolate content in B. napus?
Causative genes identified: separate deletions controlling two major QTLs overlap in gene content by orthologs of HAG1 which is a TF controlling glucosinolate biosynthesis in arabidopsis