Comparative Genomics I & II (Lectures 6&7) Flashcards
COMPARATIVE GENOMICS: – It’s All About
Similarities and Differences
In the genomes of contemporary related organisms, we see the conservation of: 2
In the genomes of contemporary related organisms, we see the conservation of:
- sequences coding for proteins and functional
RNAs from a last common ancestor
- sequences coding for proteins and functional
- sequences controlling the regulation of genes
that have similar patterns of expression
- sequences controlling the regulation of genes
Comparative Genomics : divergence? conservation? recognition?
= 4
- Divergence is seen between sequences that code for proteins, functional RNAs and regulatory regions
- responsible for differences between species
- Conservation of sequence implies conservation of function
- Recognition of orthologues and paralogues to make meaningful comparisons
Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask
Comparisons ACROSS LONG phylogenetic
distances, e.g. 1 billion years since separation: GIVES US? 2
Comparisons across long phylogenetic
distances, e.g. 1 billion years since separation:
- *give information on types and numbers of genes in functional categories
- *show little conservation of gene order and regulatory sequences
Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask:
Comparisons ACROSS MODERATE distances, e.g.70-100 million years since separation show: 3
Comparisons across moderate distances, e.g. 70-100 million years since separation show:
- *functional and non-functional DNA is found in conserved regions
- *functional sequences will have changed
less than non-functional DNA - *purifying (aka negative) selection –
removal of deleterious mutations
UNDERSTAND PHYLOGENETIC TREES
DIAGRAM AND IMAGE ON SLIDE 4
Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask:
Comparisons ACROSS SHORT DISTANCES, e.g. 5 million years since separation give: 3
Comparisons across short distances, e.g. 5 million years since separation give:
- *information about what sequences are
responsible for making organisms unique - *differences due to positive selection (aka Darwinian selection)
- *retention of mutations that benefit an
organism
Comparative Genomics – What is
Compared? = 19
- Repeat regions
- *transposable elements, microsatellites
- Markers
- *SNPs
- Non-protein coding regions
- *RNA-only genes
- *gene deserts
- Duplications
- *whole genome
- *segmental
- *gene
- *gene families
- Base composition, e.g. %GC content
- *overall
- *coding regions and non-coding regions
- Gene number in functional categories
- Favourite genes
- Chromosome rearrangements
- Gene order
Synteny?
*genes or genetic elements located
on the same chromosome
*may or may not be linked
Conserved (aka Shared) synteny?
*conservation of synteny of
orthologous genes between two or
more different organisms
*extent is inversely proportional to
length of time since divergence
from the ancestral locus
Collinearity?
Collinearity
*conservation of gene (or marker) order along a chromosomal segment in different
species
*Note: in much present-day usage, synteny has same meaning as collinearity
Collinearity diagram
A-E = genes or markers; X-Z = species; coloured boxes = coding regions slide 8
Synteny, Conserved Synteny and Collinearity: SIGNIFICANCE? = 2
- *in genomes of some grasses see high conservation of synteny and collinearity
- *knowing the genome sequence of a species with a small genome facilitates mapping and isolating genes coding for desirable traits from species with larger genomes
- *in medicine
- *loci with medical or phenotypic consequences can be recognised because of linkage to a cluster of syntenic loci
Factors affecting synteny, conserved synteny and collinearity?
*gene loss, multiple rounds of gene
duplications, chromosomal
rearrangements (fusions, splits,
inversions, reciprocal translocations)
*mask sequences that have been
derived from a common ancestral
sequence
Synteny, Conserved Synteny and Collinearity DIAGRAM
SLIDE 9
Evolutionary Convergence: WHAT IS PHENOTYPIC CONVERGENCE?
- *the independent evolution of similar or identical traits
in distantly related species due to selective
pressures, for example: - *eyes
- *echolocation in dolphins and bats
- *particular protein properties
- *biochemical pathways
Evolutionary Convergence: WHAT DOES PHENOTYPIC CONVERGENCE DO? 3
1.IDENTIFY THE DETERMINANTS leading to the independent origins of adaptive traits
- *COMPARING DETERMINANTS gives INFORMATION on the GENOMIC (and ENVIROMENTAL) background in LIMITING OT FURTHERING ADAPTATIVE INFORMATION
- *“evolutionary enablers”
Evolutionary Significance of Convergent Recruitment….5
- The presence of genes able to evolve a new function enhances the chances that a given group of organisms can evolve a new trait
- *but only a few genes have the potential to make a specific phenotypic change
- *“evolutionary enablers”
- AND….
- The absence of these genes (evolutionary enablers) in other groups of organisms can hinder the acquisition of a new trait
Evolutionary Significance of Convergent Recruitment…DIAGRAMS
WITH AND WITHOUT EVOLUTIONARY ENABLERS…
SLIDE 11 AND 12
Evolutionary Convergence – How Does It Happen?
- Phenotypic convergence may result from:
- *alterations of different loci shows changes —> in different enzymes can lead to similar phenotypes
- *alteration of homologous genes from different Taxonomic groups —> CONVERGENT RECRUITMENT
Evolutionary Convergence – How Does It Happen?
New functions usually evolve by the modification of pre-existing genes
*two criteria need to be met:
New functions usually evolve by the modification of pre-existing genes
*two criteria need to be met:
1) no deleterious effect through loss of ancestral function
2) expression profiles of the genes and kinetics of the proteins they encode must be suitable for new function
Evolutionary Convergence – How Does It Happen? KREBS CYCLE DIAGRAM
At least 60 independent origins of the
C4 photosynthetic pathway have
occurred in the flowering plants – an
excellent example of convergent
evolution
SLIDE 13
Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages:
‘Evolution of phosphoenolpyruvate carboxylase (PEPC) gene in groups
of grasses (g) and sedges (s)’ = 4
- *PEPC = enzyme catalysing first reaction of C4 photosynthesis
- *letters and numbers after g and s = PEPC gene lineages
- *red and blue triangles = PEPC gene lineages important in C4
grasses and sedges
DIAGRAM IN SLIDE 14
Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages
‘Evolution of phosphoenolpyruvate carboxylase (PEPC) gene in groups
of grasses (g) and sedges (s)’ FOUND? = 3
- *grasses – same lineage of PEPC gene (out of 6 lineages) used
> 8 times during the evolution of C4 grasses - *sedges – same lineage of PEPC gene (out of 5 lineages and distinct from that used by grasses) used > 5 times during
evolution of C4 sedges
—–>
- The predisposition of these gene lineages to evolve novel adaptive
function
Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages
‘When there is phenotypic convergence due to convergent recruitment:’ 7
- When there is phenotypic convergence due to convergent recruitment:
- *may see identical substitutions in the recruited
genes in different lineages of organisms- *e.g. C4 PEPC genes in grasses (b) and sedges (c) Note: protein sequences shown
- *likely to occur because effects of the resulting amino acid substitutions will be comparable
- *limitations to substitutions that can occur
- *allow emergence of functional, optimised protein
- *may see identical substitutions in the recruited
- Note: convergent substitutions do not explain all the convergent phenotypes seen – non-convergent (= divergent) substitutions also play a role in these adaptive changes
Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages IMPORTANAT SLIDE DIAGRAM
SLIDE 15
Comparative Genomics of Prokaryotes = 5
- Smaller genomes found in organisms living in restricted environments – e.g. ‘Nanoarchaeum equitans’ lives inside another archea
- Larger genomes in organisms living in complex habitats – e.g. ‘Bradyrhizobium japonicum’, soil bacterium that forms symbiotic relationship with plants (nitrogen-fixing root nodules)
- Why….?
- Constant environments may allow
survival with fewer genes - Changeable habitats e.g. soils –
many genes would be required for
survival even though they all may not
be used all the time
Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion : 5
- Prokaryotes show rapid adaptation
- *in part due to the acquisition of genes by horizontal gene transfer (HGT)
- *increases genome size
- *in part due to intrachromosomal gene duplication (IGD)
- *increases genome size
Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion DIAGRAM
SLIDE 18
Comparative Genomics of Prokaryotes –
e.g. Influenceof HGT and IGD on Protein Family Expansion
110 genomes representing eight distinct clades of prokaryotes compared
*different genome sizes – small (blue), average (red), large (purple)
Comparative Genomics of Prokaryotes –
e.g. Influence of HGT and IGD on Protein Family Expansion
‘SEARCHED FOR:’ 2
- *paralogues – homologues acquired through IGD – identical in sequence to endogenous gene and tandemly arranged
- *xenologues – homologues acquired through HGT – randomly arranged and showing sequence variability to the endogenous gene
Comparative Genomics of Prokaryotes –
e.g. Influence of HGT and IGD on Protein Family Expansion
FOUND: 5
- Found: 88% - 98% of expansions due to HGT
- *both small and large genomes
- *larger genomes contained the most xenologues
(contrary to previous thinking)
- paralogues have higher expression levels than xenologues
- xenologues had higher expression levels than singletons
WHO ARE NEANDERTHALS? = 3
- Neanderthals and the ancestors of modern humans diverged
270,000 to 400,000 years ago - *Neanderthals = sister group to modern humans
- Neanderthals and modern humans
coexisted in Europe for 30,000-45,000 years ago
Why sequence the Neanderthal genome?
= 2
- *record changes (mutations) that have become fixed or risen to high frequency in humans in the last several hundred thousand years
- *identify genes that have been affected by positive selection since Neanderthals and humans diverged
UNDERSTANDING Neanderthal and Modern Human Genomes ..STUDY
What DNA is used? from where? = 9
- Majority of DNA used in the 2010 study was extracted from
2 *bones of three females
3 *Vindija cave (Croatia) - Other samples from bones of Neanderthals in
5 *Spain
6 *Germany
7 *Russia - More recently other ancient
human genomes have been
sequenced - Many more comparative analyses
followed
Neanderthal and Modern Human Genomes INVESTIGATION PROCESS:
CONTROLS? USED? ASSEMBLED? COMPARED WITH? = 15
- Ran numerous controls and checks to exclude contamination
2 *microbial DNA
3 *modern human DNA - Used NGS technologies
- Assembled sequence using reference genomes
- *chimpanzee
- *modern human genome (Ventner)
- Compared sequence with
- *chimp
- *Ventner
- *South African (San)
- *West African (Yoruba)
- *Papua New Guinean
- *Chinese (Han)
- *Western European (Frenc
Neanderthal and Modern Human Genomes …WHAT HAPPENED TO THE GENOME? = 8
- One-third of genome not sequenced
- *DNA not of high enough quantity and/or quality
- Genomes are 99.84% identical
- *78 nucleotide substitutions in protein-coding genes in 300,000 years
- *modern humans have derived state of these genes
- *Neanderthals have ancestral (chimp-like) state
- *genes include those coding for proteins involved in skin physiology
8. * not clear what effect these changes have at the phenotype level
- *78 nucleotide substitutions in protein-coding genes in 300,000 years
Neanderthal and Modern Human Genomes …COMPARED SNPs between Neanderthals and present-day humans…= 7
- Compared SNPs between Neanderthals and
- *present-day humans
- *2 European Americans
- *2 East Asians
- *4 West Africans
- *present-day humans
- *diverse modern humans (French, Yoruba, Han, San, Papuan)
- *chimps
Neanderthal and Modern Human Genomes:
Compare SNPs between Neanderthals and present-day humans…FINDINGS = 3
- Neanderthals share more SNPs with
Europeans and East Asians than sub-Saharan Africans, suggesting….2. gene flow from Neanderthals to modern humans after modern humans left Africa, but before migrating into Eurasia
- 1-4% of modern Eurasian genomes derived from Neanderthal genomes
Comparisons of Neanderthal and modern
human Y chromosome (father to son
inheritance)
and mitochondrial DNA (mtDNA;
mother to all progeny)
FINDINGS = 6
- Different evolutionary history for Y
chromosome and mtDNA compared to
autosomal genomes - 360,000 to 200,000 years ago
- complete replacement of Neanderthal mtDNA and Y chromosome with those of modern humans
- low percentage of X and autosomal genes transferred
- 80,000 to 40,000 years ago
- Neanderthal X chromosome and autosomal gene flow to modern humans
Neanderthal and Modern Human Genomes :
?? Why replacement ??
FOR THE Y CHRO. = 5
For the Y chromosome:
- Neanderthal population size = lower than H. sapiens population when the two species were separated (700,000 to 500,000 years ago)
===> - Accumulation of deleterious genetic variants
====>
3. Lower fitness
- Simulations of deleterious variant accumulation suggest:
…5. * 1-2% Y chromosome fitness reduction increases replacement probability after 50,000 years to 25-50%
Neanderthal and Modern Human Genomes: NEANDERTHAL DNA…COVID-19 AND HIV …. = 11…50KB REGION
- Neanderthal DNA
* 50 kb region-
- presence on chromosome 3 raises risk of severe COVID-19
- haplotype = frequency of 30% in individuals of south Asian ancestry
- rare in east Asian and African populations; 4% Latin Americans, 8% Europeans
-
- hypothesised to be important in the immune response to other pathogens but elicits hyper response with COVID-19 infection*
- region encodes a cluster of chemokine receptors
- includes coreceptor for HIV
8. * downregulated in carriers of COVID-19-risk haplotype- ~27% lower risk of HIV infection
- includes coreceptor for HIV
- region encodes a cluster of chemokine receptors
- Risk factor predates HIV pandemic
- HIV not the selection driver
- smallpox virus, Vibrio cholerae (cholera)?
- highest frequency found in regions where cholera is endemic
11.*genetics only one factor in developing severe COVID-19
Neanderthal and Modern Human Genomes: NEANDERTHAL DNA…COVID-19 AND HIV …. = 7…75KB REGION
- Neanderthal DNA
* 75 kb region - presence on chromosome 12 protects against severe COVID-19
3. * haplotype = frequency of ~25-30% in most Eurasian populations
4. * rare in African populations south of the Sahara; at lower frequency in some
populations in the Americas (African or Native American ancestry)
- presence on chromosome 12 protects against severe COVID-19
- several genes encode enzymes (oligoadenylate synthetases; OAS) induced by interferons and ds-RNA
6. * downstream pathways that lead to degradation of intracellular ds-RNA and activation of antiviral mechanisms
7. * at least one (OAS1) shows positive selection
- several genes encode enzymes (oligoadenylate synthetases; OAS) induced by interferons and ds-RNA
Grass Genomes – WHAT IS IT? Why Study Them Using Comparative Genomics? = 6
- Grasses (Cereals)
- *provide the bulk of human nutrition
- *feed for animals
- *sustainable energy sources – biofuels
- BUT consumption is close to supply; stocks have plateaued
- Comparative genomics gives insights into resistance to biotic and abiotic stresses; growth; production; yield; other desirable traits
UNDERSTANDING Grass Genomes:
= 8
- Three subfamilies contain major food, fodder and fuel grass species
- *ancestor of all underwent whole genome duplication (WGD; shown)
3. *lineage-specific WGDs also occurred (not shown) - Whole-genome sequence available for at least one species in each subfamily
- ‘Brachypodium distachyon’ (Brachy)
-
representative of subfamily containing barley and wheat
7.relatively small genome for this subfamily- *1/10 the size of barley and wheat
Brachy Genome Compared to Other Grass
Genomes – Transposable Elements = 8
Brachy
1. *genome size and gene number – similar to rice and sorghum
2. *maize is larger due to lineage-specific WGD
- *chromosome number – half that of others
-
most LTR retrotransposons located in
pericentromeric regions and conserved syntenic breaks
5.also seen in other grass genomes - *DNA transposons more widely distributed
- *majority associated with gene rich regions
- *also seen in other grass genomes
DIAGRAM: Brachy Genome Compared to Other Grass
Genomes – Transposable Elements
SLIDE 33
LTR = long terminal repeat; STA = gene introns and satellite
tandem arrays; cLTRs = complete LTRs; sLTRs solo LTRs; DNATEs = autonomous DNA transposons; MITES = miniature
inverted-repeat transposable elements; CDS = gene exons;
triangles = syntenic breakpoints
Brachy Genome Compared to Other Grass
Genomes – Transposable Elements
‘From comparative analyses on sequenced grass genomes can conclude:’ = 2
- retrotransposon content scales with genome
size for all grass genomes - DNA transposon content is not correlated with
genome size for all grass genomes
- DNA transposon content is not correlated with
Brachy Genome Compared to Other Grass
Genomes - Conservation of Gene Families = 7
- 77% - 84% of gene families found in rice, sorghum and Brachy are shared
- *reflects relatively recent common origin
- Lineage-specific genes
- *genes for which no orthologue can be found in related species
- *taxonomic levels = grass, grass subfamily (Pooid), Brachy
- *obvious targets for functional analyses
- *may be involved in distinguishing taxa
Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny
‘In Brachy: six major duplications of chromosomal regions’ = 4
- In Brachy: six major duplications of chromosomal regions
- *covering 92% of the genome
- *originated from the ancient WGD event before grass families diverged
- *creation of paralogues
Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny:
‘Conserved synteny between Brachy, rice,
sorghum and wheat’ = 5
- Conserved synteny between Brachy, rice, sorghum and wheat
- *59 blocks of collinear orthologous
genes
- *59 blocks of collinear orthologous
- *covering 99% of the Brachy genome
- *provide a framework for
understanding grass genome evolution - *aid the assembly of sequences from
other related grasses
brachy Genome Compared to Other Grass
Genomes - Conservation of Gene Families Diagram
slide 35
Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny diagram
slide 36