Lecture 7 - Exploiting natural and induced genetic variation in crops part 2 Flashcards
Give an example of a technique used to identify genes for molecular markers
- Quantitative trait mapping - can be used to track the loci controlling trait variation
- Genome wide association scan (GWAS)
Compare GWAS to QTL mapping
- Both use bi-parental crosses - particularly good for additive effects
- Additive effects: multiple regions of the genome affecting a particular trait
- If the effect is additive the effect of different loci add to each other
- Some ability to analyse epistatic effects but this has complex statistical analysis
What are the features of GWAS?
- Do not require development of DH/RI mapping populations
- Exploits linkage disequilibrium analysis
- Exploits historial recombination to identify molecular markers tightly linked to loci controlling traits - could be up to millions of years worth of recombination within the germplasm
- Large panels with broad genetic diversity exhibit very short linkage disequilibrium
- Requires at least an approximation of the marker order in the genome of the species being studied so genes can be used as a source of MM
- Requires very high marker desities
- Requires correction for population structure to reduce false positive associations
- Candidate genes in associated genomic regions need to be analysed further to identify the causative gene
Why do GWAS not require the development of DH/RI mapping populations
crop varieties are ideal and one genotyped panel can be analysed for many traits
What is linkage disequilibrium analysis?
The tendancy for coincidence of marker alleles with trait values
What are GWAS looking for?
A correlation between marker alleles and the position of the gene controlling the trait sitting at a QTL locus as one of the genes controlling the trait
Why are GWAS’ particularly powerful?
Not limited by the small number of generation that new RI mapping popuations will have been able to be put through.
Why is it beneficical in GWAS that large panels with broad genetic diversity exhibit very short linkage disequilibrium?
Tend to have a high inheritance of an allele of a marker with the gene that is controlling the trait
But
Need to have large numbers of molecular markers to have a marker close enough to the gene that controls the trait
How can GWAS’s be short cut?
Need a diversity panel over which score the genotype, trait data, plant height, seed yield - any aspect of composition.
Marker data is scored across that panel
The more markers th better
Do an analysis based on the markers that alllow statistical corrections to be made based on the population structure
Why is genetic variation inportant”?
Contributes to the productivity and quality of crops
What makes up the genetic variation of a crop?
- gene sequence variation
- gene expression variation
How is a mixed linear association model generated?
- Trait scores from phenotype data
- Marker allele scores from marker data
- Population clustering, pairwise relatedness estimates from population structure and relatedness data
What does a mixed linear associated model allow the identification of?
Looking at gene sequence variation and gene expression variation allow an identification of sequence polymorphisms in the transcriptome
How can marker allele scores be replaced in mixed linear assocaition model?
Marker allele scores from marker data can be replaced by transcriptome SNPs and transcriptome quantification from marker data
What is the genomics platform for association genetics?
Use genome sequence scaffolds of ancestral species (B. rapa and B. oleracea)
Rearrange based on transcriptome SNP-based linkage mapping
Map crop unigenes based on sequence similarity
To determine the gene order in crop genome and any polymorphsisms
For b. napus: 21 000 ish SNPs map 9000 transcript assemblies, 126 000 transcript assemblies anchored to the genome
How do you use functional genomics for association mapping?
Start with a genetic diversity panel and make mRNA from the panel
Sequence (usually done with illumini platform)
Analyse computationally using genome resources identified in transcriptome to produce functional genotypes
Score functional genotypes for gene sequence based markers e.g. SNPs or for gene expression based markers e.g. GEMs
Quantify transcripts
Identify how many sequence tags are present for each of the genes in the genome and find the correlation between these and the trait data scored across the diversity panel with statistics that account for the population structure
Visualise outcome
What is acheived by using functional genotypes for association mapping?
Get the genome order of markers and the correlation of markers within alleles or particular expression levels
How can marker trait associations be visualised?
Manhattan plots
Distance up the Y axis: Significance of association (P) (log scale)
Distance across the X axis: hypothetical position in the genome
What does a manhattan plot show?
Shows data from GWAS
The significance of the association of a particular marker in its hypothetical position in a genome
What did the linkage disequilibrium analysis for erucic acid content of seed oil (B. napus) show?
- validation of manhattan plot
- 53 B. napus associations, used transcriptome SNP markers
- Genome wide association scan using 63 000 transcriptome SNP markers
- Found clusters of high significance associations that coincided precisely with the positions of the two genes known to control the erucic acid content of seeds (association peaks coincide)
- Erucic acid not required in edible oil (breeding target to reduce) but required industrially (breeding target to increase)
What did the association genetics study using transcriptome SNP markers show for seed glucosinate content (B. napus)?
- Lucosinate accumulates in the meal and a high content has an adverse effect on the quality of meal for feed
- genome-wide association scan using around 63 000 trascriptome SNP markers
- also a genome wide regression analysis for expression polymorphism markers in around 115 000 informative hypothetically ordered transcript assemblies
- Coincinding SNP and GEM associations provided further evidence for the importance of the three regions identified by QTL analysis for controlling trait variation
- Superimposition of SNP and GEM associatons allows be able to zoom in on small part of the chromosome and high significance assocaitions corresponding between SNP markers and GEM define small intervals where the canditate genes can be found
- Identified negative correlation, lower expression of these genes led to lower accumulation of glucosinase (indicating deletion in a part of the genome)
- Identified several loci in the genome, two regions show detetion profile
- Overlap by the ortholog in a single gene in arabidopsis called HAG1 which is a TF controlling glucosinolate synthesis in arabidopdid
What is the sixe of diversity panels?
400-600 plants
In full sized, even in b. napus (relatively short of diversity) still find 100-1000 SNP markers
What are GEM associations?
low expression correlated with low trait value
interpret as segmental deletion of homeologous exchange
What was idetified by superimposing SNP and GEM associations to identify traits controlling glucosinolate content in B. napus?
Causative genes identified: separate deletions controlling two major QTLs overlap in gene content by orthologs of HAG1 which is a TF controlling glucosinolate biosynthesis in arabidopsis
What are the benefits of producing larger diversity panels?
Produce greater resolution
What do narrow peaks in manhattan plots indicate?
No breeding selection
What are the important components of rape seed?
tocopherol - vitamin E
Four different isoforms accumulate which have different commercial values
Want more gamma tocopherol
How can GWAS increase the level of gamma tocopherol in rape seeds?
GWAS across diversity panel looking for the proportions of these two chemicals accumulated in the seeds
Found sharp assocation peak at one position (indicates no breeding selection just random recombination across the genomes)
Identified as an ortholog of a gene studied in arabidopsis VTE4
Once molecular markers have been identified for traits in crops how can these be used?
- genetic complexity is understood, allowing an informed design of experiments
- germplasm used for breeding can be scanned for beneficial alleles
- crossing programs can be conducted using material with the best alleles
- after crosses and selfing of progeny very large numbers of individual seedlings are availaible for screening
- a molecular marker screen is conducted used DNA or RNA isolated from seedlings, only those seedlings with beneficial aleles present at a tracked loci are retained for further crossing or phenotypic analysis
- improves the efficiecy of breeding by focusing the expensive phenotyping activities only on material that is likely to have the required characteristics
- enables alien introgression following inter-specific crosses with wild relatives, as marker assisted intorgression of target alleles into elite genetic backgrounds can be undetaken without phenotypic asses,emt
What is mutation breeding?
Traditional breeding method for broadening genetic diversity with over 3000 varieties released
What does mutation breeding involve?
Involves the treatment of seeds with a mutagen:
- Radiation
- Chemicals such as ethyl methanosulphanate
Select for phenotypes in the M2 generation
Backcross to untreated material to remove background mutations and select for phenotype
Useful where loss of function is required e.g. to block a metabolic pathway
What are the limitations of mutation breeding?
- small effects are difficult to identify as the material is ‘sick’ whilst carrying a high mutation load
- genetic redundancy in polyploid species means that only loci with large additiive affects will be identified
- phenotype based selection makes breeding inefficent
What is predictive muation breeding?
Forward genetics approach - if know the target gene can screen across a population for mutation of that gene
What is the process of predicted mutation breeding?
TILLING method applied to crops
Requires knowledge of the genes causatative for trait variation
Mutagen treated population is produced and DNA made from M2 generation plants
Target genes are screened for sequence changes
Both K/O and quantitative effects (allelic series) can be produced
Mutagen commonly EMS
What are the key advatages of predictive mutation breeding?
- No requirement for additivity of effect or early phenotyping
- can be used in polyploid species but requires locus specific PCR
- mutation causes a DNA sequence change that can be used as an SNP marker in marker assisted breeding
How is a predictive mutation breeding population produced?
- Treat seeds with EMS - imbibe with a variety of concetrations and wash
- Sow plants at high density
- Grow and self plants from treatments with 50% kill
- collect seed
- collect seed from M1 plants
- Sow in field and collect selfed seed from one M2 plant per M1 individual
- Make DNA from a leaf of each M2 plant from which seeds will be collected
- EMS alkylates guanine to form O6-ethylguanine which can pair with thymine so results in a transition mutation form C/G to T/A
Why would you want to produce low polyunsaturate rapeseed oil?
- Rapeseed oil has a relatively high content of polyunsaturated fatty acids
- High polyunsaturate content renders vegetable oil thermally unstable
- In arabidopsis FAD2 locus (encoding oleate desaturase) control polyunsaturate content
- Four orthologs of FAD2 in oilseed rape (Brassica napus)
- Pyramiding knock outs of three of these have moderate effect on polyunsaturate content of oil
- aim to knock out the forth ortholog to block pathway
What are the different purposes of oilseed rape?
- conventional rape seed oil is fairly high in polyunsaturated fatty acids
- Industrial base oil is very low in polyunsaturated fatty acids
How was very low polyunsaturate rapeseed oil produced by predictive mutation breeding?
- start by identifying a suitable region of a target protein - identify C and G bases that when mutated to T or A would result in a stop codon being generated that prematurely terminated the protein
- use any genome sequences available to design locus-specific PCR amplicon of an appropriate size for mutation screening method
- Various screening methods available to detect mismatch/mutations
- Confirm mutations by capilliary sequencing after amplification from single plant DNA samples
- Grow plants from the seeds produced by plants with the identified mutations and confirm these mutations
- Produce, by selfing, plants homozygous for mutations and phenotype
- Back cross mutants to untreated material and introgress mutated genes to reduce bacground mutations
What screening methods are there to detect a mismatch/mutation?
- Mismatch detection by melting
- Mismatch detection by Cel 1 treatment
- Mutation detection by capilliary sequencing
- Mutation detection by next generation sequencing
What are the features of using capilliary sequencing for screening/confirmation of mutations?
- conventional capilliary sequencing of PCR amplicons allows an analysis of sequences at the level of trace files (bases)
- detects mutations as half (ish) height peaks on the traces (if mutation heterozygous) using software e.g. mutation surveyeor
How can you phenotypically validate homozygous mutant lines?
- Identify plant lines homozygous for mutations likely to impact the function of a gene product e.g. premature stop codons
- Amplify (by selfing) to produce sufficient seed for phenotype trialling
What does a range of mutations in a protein coding sequence allow?
Enables an analysis of protein function
- analyse mutations which cause amino acids changes in addition to those producing premature stop codons enables a probing of proteins for regions critical for function
What can second generation sequencing be used for?
To produce fucntional genotypes
What do functional genotypes permit?
Associated of trait variation with gene sequence variation and gene expression variation, producing molecular markers to assist breeding and identifying causative genes
How does mutation breeding act?
By increasing the range of genetic variation available for breeding
What can be done if the sequences of trait controlling genes are known?
Sequence led approaches can be taken to identify novel alleles following mutagenesis
What does the production of an allelic series of mutations enable?
The probing of the funtion of gene products as well as trait optimisation