Lecture 6 - Establishing genomics platform for crop species Flashcards

1
Q

Why is comparative genomics important?

A
  • polyploid crops are related to species with simpler genomes
    • polyploid crops are hard to work with - complex
  • some species with small genomes have been developed as model systems
  • lower genetic redundancy adis in the identification of genes
  • model species are small and easy to grow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give some examples of polyploid crops and their related diploid species

A
  • Bead wheat - Brachypodium distachon (wild triticum species)
  • Oilseed rape - arabidopsis thaliana (wild brassica species)
  • potato (wild solanum species)
  • cotton (wild gossypium species)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the elementary events of gene evolution?

A
  • vertical descent (speciation) with modification
  • gene duplication
  • gene loss
  • horizontal gene transfer
  • fusion/fission/rearrangements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define homologues

A

Genes sharing a common origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define orthologues

A

Genes originating from a single ancestoral gene in the last common ancestor of compared genomes

Doesn’t mean the functions are equivilent although they normally are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Deine paralogues

A

genes related by duplication

can exist in different genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are homeologous genes?

A

orthologous genes in the same species as a result of recent polyploidy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the genetics of bread wheat using nomenclature from the relationships of genes

A

Bread wheat has homologous genes inherited from ancestral genomes. Ancestoral species underwent speciation event to give rise to Aegilops and then a further speciation event to give rise to Aegilops speltoides and triticum uratu, then hybridised to form a polyploid Triticum turgidum (diploid) and Aegilops tauschii (monoploid) - these then hybridised to form bread wheat (triticum aestivum (3 genomes A,B,D,))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the genetics of brassica using nomenclature from the relationships of genes

A

hybridisation of brassica rapa (2n=20) and Brassica oleracae (2n=18) formed Brassica napus (2n=38)

Formed polyploid from the hybridisation of two species and a doubling of chromosomes

A genome of B.rapa and C genome of B.oleracea hybridise to form AACC genome of B.napus (oilseed rape)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the range of plant genomes in size?

A

Arabidposis thaliana: 130 000 000 bp, 14% repetitive, 25 000 genes

Human: 3GB

Barley: 1/3 wheat genome

Hexaploid bread wheat: 17 GB (17 000 000 000bp) 80% repetitive (hard to assemble genomes by looking at which part of the genomes crossover with which - arises from transposon amplification), 90 000 genes (genes not a large component of the genome size, but does mean hard to target by genetic manipulation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the history of plant genome sequencing in plants?

A

Earliest genome published (arabidopsis) 2000

relatively few genomes sequenced for many years due to the cost of sequencing and problems of assembling repetitive genomes

Big increase in genome sequences in recent years due to the improvements in next generation sequecing

Cost reduced and enabled sequening of more complex genomes

but mostly only of draft quality

Rice and arabidopsis well sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the structure of the rice genome?

A

Oriza satica (spp. japonica cv. Nipponbare)

  • 370 Mb finished sequence of around 440Mb
  • 26% repetitive
  • 37500 genes
  • finished to a very high standard
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are the genomes of cereals mostly related?

A

Mostly colinear

however this isn’t normal

in most, polyploidy and genome rearrangment has occured increasing gene copy number and complicating colineararity studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is plant genome evolution shaped?

A

Cycles of polyploidy and diploidisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is it particularly useful that the rice genome has been mostly sequenced?

A

Cereal crop genomes align very well when common markers are used that are present across multiple genomes (extensive marker colinearity)

Show a high degree of colinearisation of genomes in even distantly related genomes

  • Triticeae, Maize, Sorghum, Sugar cane, Foxtail millet, rice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the structure of the arabidopsis genome?

A
  • ancient whole genome duplication event in arabidopsis, shows signs of being derived from a teraploid ancestor when try to align sequences between species
  • within the arabidopsis genome get a lot of coliniarity relationships at multiple places in the genome
  • shows that most species undergo a more structural rearranement than observed in grass species
  • Arabidopsis thaliana (columbia)
  • 115Mb genome
  • 14% repetitive
  • 25000 genes
  • finished to a very high standard
17
Q

What is the structure of the Brassica rapa genome?

A
  • first brassica genome to be sequenced
  • Brassica rapa (Chiifu)
  • 285Mb finished sequence of around 480Mb (not huge genome)
  • 40% repetitive
  • 41000 genes
  • finished to moderate standard
18
Q

How can the colinearity between species be illustated?

A

Colinearity plot

Red dots - where have most closely related sequence between two genomes

Diagonals - regions of colinearity between genomes

19
Q

Why can brassica species be used as a model for comparative genomics?

A

In brassica species, have a group of species related to each other

20
Q

How have different brassica species evolved?

A

Evolved by polyploidy and hybridisation followed by a period of diploidisation where newly formed polyploid genomes begin to stabilise.

These events can be detected by looking at sequence divergence between species

Arabidopsis come from an ancestral duplication event followed by a long period of diploidisation.

Brassica are related by this ancestral species but went through a genome triplication, divergence then hybridisation for B. Napus.

Had first polyploidy and two additional rounds of polyloidy before diploidisation process.

21
Q

How does the way crop species such as Brassica and model species such as arabidopsis evolve determine their ability to be used in GE?

A

Highlights problem with crops - arabidopsis genome is present as mostly duplicated segments, need to K/O both genes to see effect but most are down to a single copy.

Brassica napus has 12 related genome segments, complicated to do functional genomics

22
Q

What are accessions/incotypes/genetic varients?

A

Related cultivars where the genomes are slightly different

23
Q

When might polymorphisms between homolgues be mistaken for allelic varients?

A

Anytime but especially when sequence redundancy is low

24
Q

Why is itharder to gentoype polyploid sequences compared to diploid sequnces?

A

Diploid sequences when genotyping have a bunch of sequences all of one incotype and one with the other, and SNP markers are the base deletions which differ

In polyploid species if have the same locus in one cultivar as in another then there are often complications from the homolog (corresponding gene in the other genome) which contributes to the sequences observed

In addition to alleleic variability also have a lot of differences between homologues between causes confusion as two different types of sequence polymorphism. Inter homolog polyorphism is what don’t want. May be 100 times more abundant than alleleic variation (used for mapping)

25
Q

How can you deal with inter homolog polymorphisms when mapping a genome?

A

Have extra codes in polyploidy nucleotide bases which correspond to a mixture of two bases- ambigutity codes

Y corresponds to a mixture of C and T bases

S corresponds to a mixture of G and C bases

Because this is a polymorphism between homologus loci the polymorphism is present in both cultivar types

Interhomolog polymorphisms can be ignored if have sequenced deeply enough

If different between cultivar SNP it is a hemi-SNP

26
Q

What can be used to score sequences in mapping and look at sequence polymorphisms?

A

Can use illumi mRNA sequencing

Reads from two cultivars (e.g. two oilseed rape cultivars) and maps onto a reference sequence Transcript assembly acts as reference e.g. brassica transcriptome) for the identification of SNP

mRNA derived sequences often used - cheaper

Call potential varient alleles

Interhomolog polymorphisms called against the reference sequence can be filtered out as they would occur in both cultivars

Main source of error of using next gen sequencing to look at SNP markers: if have very thin coverage of sequence reads won’t be sure if there is enough depth of sequencing to have captured both types of bases in that particular region

27
Q

How can sequence based SNP calling in polyploid species be used to generate linkage maps for crop species?

A
  1. Need to make a high density linkage map with more than 50-100 linkage markers.
  2. Start with two variations that are as genetically divergent from each other as can be found. Cross these and produce F1 generation.
  3. self the hybrid back and develop recombinant inbred lines. (using many rounds of single seed descent/capture process - get spores and grow up to produce mapping population which then are invarient and can be maintained indefinately).
  4. Once a population has been generated individuals need to be sequenced e.g. through mRNA sequencing.
  5. Call alleles for each of the individual plants by mapping the sequence reads to the reference sequence and calling dependent on whether have mixed bases or single bases and looking for what varies across a population.
  6. Construct a genetic linkage map using appropraite software.
  7. For each polymorphic marker give each line in a population a score of a or b (corresponding to the allele or marker which occurs in that particular line - where a referes to a female parent in the original crossover and b refers to the male parent)
  8. This is then used by computational processing to find the order in the genome to which these markers fit best
  9. Generates a high density SNP linkage map.
  10. If done based of mRNA sequences shows that the polymorphisms are within genes.
28
Q

What is the transcriptome SNP linkage map for oilseed rape?

A

TNDH linkage map with around 21000 SNP markers for 527 recombination lines

Large number of markers generally limited by the number of recombination bins present within the sequenced population. Although around 21000 markers, they will simplify to a number of different patterns and scoring strings which will represent the recombination bins across the population.

29
Q

How do you interpret a linkage map?

A

Have vertical strings that correspond to an allele and when stops, represents a recombination point within the population.

Markers are in the order which the occur in the genome

Can look for sequence similarity in other genomes using these markers and see regions of high colinearity e.g. between arabidopsis and brassica enome

30
Q

What was shown by doing comparative genomics between brassica and arabidopsis?

A

found around 9000 brassica genes coevolved directly Used genetic linkage map by SNP

Most SNPs have sequence similarity with arabidopsis

Colour coding gives block colinearity between genomes consistent with diagonals on genome plot

31
Q

How can comparative genomics be done between brassica species?

A

No genome sequence for Brassica napus but there is for an ancestral species.

For genes in which a polymmorphism occurs, can find through the sequence similarity in the gene sequence of scaffold of the ancestral sequence

As know the order of the markers (from previos comparison of brassica napus with arabidopsis) to identify and associate te scaffolds of other species to the constructed linkage maps

125 000 transcripts anchored to the pseudomolecules (comprising B. rapa and B. oleracea genome sequence scaffolds) - represents the hypothetical gene order in oil seed rape

32
Q

What is the generalised approach to genomics platform development?

A
  1. Take genome sequence scaffold of target species and anchor mapped markers to scaffold
  2. Leads to the identification of a lot of chimeric scaffold assembled
  3. Quality can be improved by splitting and rearranging scaffolds to match them to the correct parts of the genome
  4. These assemble in pseudomolecules which represent genome organisation.
  5. Take all genes from the species/transcript assemblies and find sequence similarites by BLAST to anchor the transcripts to the pseudomolecules
  6. Hypothetical order of genes
  7. Can then align genes of species of interest to sequence similarity

Need higher density linkage map the smaller the sequence scaffolds

33
Q

How can you generate a genomics platform without the ancestral sequence information?

A

Use a more distant species to organise genes of the species of interest

Exploit colinear genomes at least on a short range in order to infer gene orders

e.g. comaprative genomics of wheat with Brachypodium

4677 genes were anchored directly by SNP mapping to high density genetic linkage map (by genetic linking of transcriptomr SNP gene markers)

Enables an analysis of the colinearity of the genomes of hexaploid wheat and B. distachyon

Identify the best sequence map of wheat transcripts to genome sequence scaffold of related species (B. distachyon)

Relatively simple rearrangment of sequences as the B. distachyon sequence is highly contiguos and cereal genomes show fewer rearrangements than those of brassicas e.g.

Once have an organisation of brachypodium genome into pseudomolecules can use sequence similarity to position the rest of the genome and identify a hypothetical organisation in the wheat genome of identified wheat genes.

112479 transcripts anchored to pseudomolecules (comprising B.distachyon genome sequence scaffolds) to represent the hypotetical gene order in wheat

34
Q

How can you generate a genomics platform without the genome sequence not being available in large scaffolds?

A

Use genome sequence scaffolds of related species, the closer related the species the better the related genome is as proxy