Comparative Genomics I & II (Lectures 6&7) Flashcards

1
Q

COMPARATIVE GENOMICS: – It’s All About
Similarities and Differences

In the genomes of contemporary related organisms, we see the conservation of: 2

A

In the genomes of contemporary related organisms, we see the conservation of:

    • sequences coding for proteins and functional
      RNAs from a last common ancestor
    • sequences controlling the regulation of genes
      that have similar patterns of expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Comparative Genomics : divergence? conservation? recognition?
= 4

A
  1. Divergence is seen between sequences that code for proteins, functional RNAs and regulatory regions
      • responsible for differences between species
  2. Conservation of sequence implies conservation of function
  3. Recognition of orthologues and paralogues to make meaningful comparisons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask

Comparisons ACROSS LONG phylogenetic
distances, e.g. 1 billion years since separation: GIVES US? 2

A

Comparisons across long phylogenetic
distances, e.g. 1 billion years since separation:

  1. *give information on types and numbers of genes in functional categories
  2. *show little conservation of gene order and regulatory sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask:

Comparisons ACROSS MODERATE distances, e.g.70-100 million years since separation show: 3

A

Comparisons across moderate distances, e.g. 70-100 million years since separation show:

  1. *functional and non-functional DNA is found in conserved regions
  2. *functional sequences will have changed
    less than non-functional DNA
  3. *purifying (aka negative) selection –
    removal of deleterious mutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

UNDERSTAND PHYLOGENETIC TREES

A

DIAGRAM AND IMAGE ON SLIDE 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Comparative Genomics – It’s All About
Similarities and Differences AND the
Questions You Ask:

Comparisons ACROSS SHORT DISTANCES, e.g. 5 million years since separation give: 3

A

Comparisons across short distances, e.g. 5 million years since separation give:

  1. *information about what sequences are
    responsible for making organisms unique
  2. *differences due to positive selection (aka Darwinian selection)
  3. *retention of mutations that benefit an
    organism
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Comparative Genomics – What is
Compared? = 19

A
  1. Repeat regions
    1. *transposable elements, microsatellites
  2. Markers
    1. *SNPs
  3. Non-protein coding regions
    1. *RNA-only genes
    2. *gene deserts
  4. Duplications
    1. *whole genome
    2. *segmental
    3. *gene
    4. *gene families
  5. Base composition, e.g. %GC content
    1. *overall
    2. *coding regions and non-coding regions
  6. Gene number in functional categories
  7. Favourite genes
  8. Chromosome rearrangements
  9. Gene order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Synteny?

A

*genes or genetic elements located
on the same chromosome

*may or may not be linked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Conserved (aka Shared) synteny?

A

*conservation of synteny of
orthologous genes between two or
more different organisms

*extent is inversely proportional to
length of time since divergence
from the ancestral locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Collinearity?

A

Collinearity
*conservation of gene (or marker) order along a chromosomal segment in different
species

*Note: in much present-day usage, synteny has same meaning as collinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Collinearity diagram

A

A-E = genes or markers; X-Z = species; coloured boxes = coding regions slide 8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Synteny, Conserved Synteny and Collinearity: SIGNIFICANCE? = 2

A
  1. *in genomes of some grasses see high conservation of synteny and collinearity
    1. *knowing the genome sequence of a species with a small genome facilitates mapping and isolating genes coding for desirable traits from species with larger genomes
  2. *in medicine
    1. *loci with medical or phenotypic consequences can be recognised because of linkage to a cluster of syntenic loci
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Factors affecting synteny, conserved synteny and collinearity?

A

*gene loss, multiple rounds of gene
duplications, chromosomal
rearrangements (fusions, splits,
inversions, reciprocal translocations)

*mask sequences that have been
derived from a common ancestral
sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Synteny, Conserved Synteny and Collinearity DIAGRAM

A

SLIDE 9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Evolutionary Convergence: WHAT IS PHENOTYPIC CONVERGENCE?

A
  1. *the independent evolution of similar or identical traits
    in distantly related species due to selective
    pressures, for example:
  2. *eyes
  3. *echolocation in dolphins and bats
  4. *particular protein properties
  5. *biochemical pathways
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Evolutionary Convergence: WHAT DOES PHENOTYPIC CONVERGENCE DO? 3

A

1.IDENTIFY THE DETERMINANTS leading to the independent origins of adaptive traits

  1. *COMPARING DETERMINANTS gives INFORMATION on the GENOMIC (and ENVIROMENTAL) background in LIMITING OT FURTHERING ADAPTATIVE INFORMATION
  2. *“evolutionary enablers”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Evolutionary Significance of Convergent Recruitment….5

A
  1. The presence of genes able to evolve a new function enhances the chances that a given group of organisms can evolve a new trait
    1. *but only a few genes have the potential to make a specific phenotypic change
  2. *“evolutionary enablers”
  3. AND….
  4. The absence of these genes (evolutionary enablers) in other groups of organisms can hinder the acquisition of a new trait
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Evolutionary Significance of Convergent Recruitment…DIAGRAMS

A

WITH AND WITHOUT EVOLUTIONARY ENABLERS…

SLIDE 11 AND 12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Evolutionary Convergence – How Does It Happen?

  • Phenotypic convergence may result from:
A
  1. *alterations of different loci shows changes —> in different enzymes can lead to similar phenotypes
  2. *alteration of homologous genes from different Taxonomic groups —> CONVERGENT RECRUITMENT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Evolutionary Convergence – How Does It Happen?

New functions usually evolve by the modification of pre-existing genes
*two criteria need to be met:

A

New functions usually evolve by the modification of pre-existing genes
*two criteria need to be met:

1) no deleterious effect through loss of ancestral function

2) expression profiles of the genes and kinetics of the proteins they encode must be suitable for new function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Evolutionary Convergence – How Does It Happen? KREBS CYCLE DIAGRAM

A

At least 60 independent origins of the
C4 photosynthetic pathway have
occurred in the flowering plants – an
excellent example of convergent
evolution

SLIDE 13

22
Q

Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages:

‘Evolution of phosphoenolpyruvate carboxylase (PEPC) gene in groups
of grasses (g) and sedges (s)’ = 4

A
  1. *PEPC = enzyme catalysing first reaction of C4 photosynthesis
  2. *letters and numbers after g and s = PEPC gene lineages
  3. *red and blue triangles = PEPC gene lineages important in C4
    grasses and sedges

DIAGRAM IN SLIDE 14

23
Q

Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages

‘Evolution of phosphoenolpyruvate carboxylase (PEPC) gene in groups
of grasses (g) and sedges (s)’ FOUND? = 3

A
  1. *grasses – same lineage of PEPC gene (out of 6 lineages) used
    > 8 times during the evolution of C4 grasses
  2. *sedges – same lineage of PEPC gene (out of 5 lineages and distinct from that used by grasses) used > 5 times during
    evolution of C4 sedges

—–>

  1. The predisposition of these gene lineages to evolve novel adaptive
    function
24
Q

Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages

‘When there is phenotypic convergence due to convergent recruitment:’ 7

A
  1. When there is phenotypic convergence due to convergent recruitment:
    1. *may see identical substitutions in the recruited
      genes in different lineages of organisms
      1. *e.g. C4 PEPC genes in grasses (b) and sedges (c) Note: protein sequences shown
      2. *likely to occur because effects of the resulting amino acid substitutions will be comparable
      3. *limitations to substitutions that can occur
      4. *allow emergence of functional, optimised protein
  2. Note: convergent substitutions do not explain all the convergent phenotypes seen – non-convergent (= divergent) substitutions also play a role in these adaptive changes
25
Q

Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages IMPORTANAT SLIDE DIAGRAM

A

SLIDE 15

26
Q

Comparative Genomics of Prokaryotes = 5

A
  1. Smaller genomes found in organisms living in restricted environments – e.g. ‘Nanoarchaeum equitans’ lives inside another archea
  2. Larger genomes in organisms living in complex habitats – e.g. ‘Bradyrhizobium japonicum’, soil bacterium that forms symbiotic relationship with plants (nitrogen-fixing root nodules)
  3. Why….?
  4. Constant environments may allow
    survival with fewer genes
  5. Changeable habitats e.g. soils –
    many genes would be required for
    survival even though they all may not
    be used all the time
27
Q

Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion : 5

A
  1. Prokaryotes show rapid adaptation
    1. *in part due to the acquisition of genes by horizontal gene transfer (HGT)
    2. *increases genome size
    3. *in part due to intrachromosomal gene duplication (IGD)
      1. *increases genome size
28
Q

Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion DIAGRAM

A

SLIDE 18

29
Q

Comparative Genomics of Prokaryotes –

e.g. Influenceof HGT and IGD on Protein Family Expansion

A

110 genomes representing eight distinct clades of prokaryotes compared

*different genome sizes – small (blue), average (red), large (purple)

30
Q

Comparative Genomics of Prokaryotes –

e.g. Influence of HGT and IGD on Protein Family Expansion

‘SEARCHED FOR:’ 2

A
  1. *paralogues – homologues acquired through IGD – identical in sequence to endogenous gene and tandemly arranged
  2. *xenologues – homologues acquired through HGT – randomly arranged and showing sequence variability to the endogenous gene
31
Q

Comparative Genomics of Prokaryotes –

e.g. Influence of HGT and IGD on Protein Family Expansion

FOUND: 5

A
  1. Found: 88% - 98% of expansions due to HGT
    1. *both small and large genomes
    2. *larger genomes contained the most xenologues
      (contrary to previous thinking)
  2. paralogues have higher expression levels than xenologues
  3. xenologues had higher expression levels than singletons
32
Q

WHO ARE NEANDERTHALS? = 3

A
  1. Neanderthals and the ancestors of modern humans diverged
    270,000 to 400,000 years ago
  2. *Neanderthals = sister group to modern humans
  3. Neanderthals and modern humans
    coexisted in Europe for 30,000-45,000 years ago
33
Q

Why sequence the Neanderthal genome?
= 2

A
  1. *record changes (mutations) that have become fixed or risen to high frequency in humans in the last several hundred thousand years
  2. *identify genes that have been affected by positive selection since Neanderthals and humans diverged
34
Q

UNDERSTANDING Neanderthal and Modern Human Genomes ..STUDY

What DNA is used? from where? = 9

A
  1. Majority of DNA used in the 2010 study was extracted from
    2 *bones of three females
    3 *Vindija cave (Croatia)
  2. Other samples from bones of Neanderthals in
    5 *Spain
    6 *Germany
    7 *Russia
  3. More recently other ancient
    human genomes have been
    sequenced
  4. Many more comparative analyses
    followed
35
Q

Neanderthal and Modern Human Genomes INVESTIGATION PROCESS:

CONTROLS? USED? ASSEMBLED? COMPARED WITH? = 15

A
  1. Ran numerous controls and checks to exclude contamination
    2 *microbial DNA
    3 *modern human DNA
  2. Used NGS technologies
  3. Assembled sequence using reference genomes
  4. *chimpanzee
  5. *modern human genome (Ventner)
  6. Compared sequence with
  7. *chimp
  8. *Ventner
  9. *South African (San)
  10. *West African (Yoruba)
  11. *Papua New Guinean
  12. *Chinese (Han)
  13. *Western European (Frenc
36
Q

Neanderthal and Modern Human Genomes …WHAT HAPPENED TO THE GENOME? = 8

A
  1. One-third of genome not sequenced
  2. *DNA not of high enough quantity and/or quality
  3. Genomes are 99.84% identical
    1. *78 nucleotide substitutions in protein-coding genes in 300,000 years
      1. *modern humans have derived state of these genes
      2. *Neanderthals have ancestral (chimp-like) state
    2. *genes include those coding for proteins involved in skin physiology
      8. * not clear what effect these changes have at the phenotype level
37
Q

Neanderthal and Modern Human Genomes …COMPARED SNPs between Neanderthals and present-day humans…= 7

A
  1. Compared SNPs between Neanderthals and
    1. *present-day humans
      1. *2 European Americans
      2. *2 East Asians
      3. *4 West Africans
  2. *diverse modern humans (French, Yoruba, Han, San, Papuan)
  3. *chimps
38
Q

Neanderthal and Modern Human Genomes:

Compare SNPs between Neanderthals and present-day humans…FINDINGS = 3

A
  1. Neanderthals share more SNPs with
    Europeans and East Asians than sub-Saharan Africans, suggesting….
        2. gene flow from Neanderthals to modern humans after modern humans left Africa, but before migrating into Eurasia
  2. 1-4% of modern Eurasian genomes derived from Neanderthal genomes
39
Q

Comparisons of Neanderthal and modern
human Y chromosome (father to son
inheritance)

and mitochondrial DNA (mtDNA;
mother to all progeny)

FINDINGS = 6

A
  1. Different evolutionary history for Y
    chromosome and mtDNA compared to
    autosomal genomes
  2. 360,000 to 200,000 years ago
      • complete replacement of Neanderthal mtDNA and Y chromosome with those of modern humans
      • low percentage of X and autosomal genes transferred
  3. 80,000 to 40,000 years ago
      • Neanderthal X chromosome and autosomal gene flow to modern humans
40
Q

Neanderthal and Modern Human Genomes :
?? Why replacement ??

FOR THE Y CHRO. = 5

A

For the Y chromosome:

  1. Neanderthal population size = lower than H. sapiens population when the two species were separated (700,000 to 500,000 years ago)
    ===>
  2. Accumulation of deleterious genetic variants

====>
3. Lower fitness

  1. Simulations of deleterious variant accumulation suggest:
    …5. * 1-2% Y chromosome fitness reduction increases replacement probability after 50,000 years to 25-50%
41
Q

Neanderthal and Modern Human Genomes: NEANDERTHAL DNA…COVID-19 AND HIV …. = 11…50KB REGION

A
  1. Neanderthal DNA
    * 50 kb region
      • presence on chromosome 3 raises risk of severe COVID-19
        • haplotype = frequency of 30% in individuals of south Asian ancestry
        • rare in east Asian and African populations; 4% Latin Americans, 8% Europeans
    • hypothesised to be important in the immune response to other pathogens but elicits hyper response with COVID-19 infection*
    • region encodes a cluster of chemokine receptors
        • includes coreceptor for HIV
          8. * downregulated in carriers of COVID-19-risk haplotype
          • ~27% lower risk of HIV infection
  2. Risk factor predates HIV pandemic
    • HIV not the selection driver
    • smallpox virus, Vibrio cholerae (cholera)?
    • highest frequency found in regions where cholera is endemic

11.*genetics only one factor in developing severe COVID-19

42
Q

Neanderthal and Modern Human Genomes: NEANDERTHAL DNA…COVID-19 AND HIV …. = 7…75KB REGION

A
  1. Neanderthal DNA
    * 75 kb region
    • presence on chromosome 12 protects against severe COVID-19
      3. * haplotype = frequency of ~25-30% in most Eurasian populations
      4. * rare in African populations south of the Sahara; at lower frequency in some
      populations in the Americas (African or Native American ancestry)
    • several genes encode enzymes (oligoadenylate synthetases; OAS) induced by interferons and ds-RNA
      6. * downstream pathways that lead to degradation of intracellular ds-RNA and activation of antiviral mechanisms
      7. * at least one (OAS1) shows positive selection
43
Q

Grass Genomes – WHAT IS IT? Why Study Them Using Comparative Genomics? = 6

A
  1. Grasses (Cereals)
  2. *provide the bulk of human nutrition
  3. *feed for animals
  4. *sustainable energy sources – biofuels
  5. BUT consumption is close to supply; stocks have plateaued
  6. Comparative genomics gives insights into resistance to biotic and abiotic stresses; growth; production; yield; other desirable traits
44
Q

UNDERSTANDING Grass Genomes:
= 8

A
  1. Three subfamilies contain major food, fodder and fuel grass species
  2. *ancestor of all underwent whole genome duplication (WGD; shown)
    3. *lineage-specific WGDs also occurred (not shown)
  3. Whole-genome sequence available for at least one species in each subfamily
  4. ‘Brachypodium distachyon’ (Brachy)
  5. representative of subfamily containing barley and wheat
    7.
    relatively small genome for this subfamily
    1. *1/10 the size of barley and wheat
45
Q

Brachy Genome Compared to Other Grass
Genomes – Transposable Elements = 8

A

Brachy
1. *genome size and gene number – similar to rice and sorghum
2. *maize is larger due to lineage-specific WGD

  1. *chromosome number – half that of others
  2. most LTR retrotransposons located in
    pericentromeric regions and conserved syntenic breaks
    5.
    also seen in other grass genomes
  3. *DNA transposons more widely distributed
  4. *majority associated with gene rich regions
  5. *also seen in other grass genomes
46
Q

DIAGRAM: Brachy Genome Compared to Other Grass
Genomes – Transposable Elements

A

SLIDE 33

LTR = long terminal repeat; STA = gene introns and satellite
tandem arrays; cLTRs = complete LTRs; sLTRs solo LTRs; DNATEs = autonomous DNA transposons; MITES = miniature
inverted-repeat transposable elements; CDS = gene exons;
triangles = syntenic breakpoints

47
Q

Brachy Genome Compared to Other Grass
Genomes – Transposable Elements

‘From comparative analyses on sequenced grass genomes can conclude:’ = 2

A
  1. retrotransposon content scales with genome
    size for all grass genomes
    • DNA transposon content is not correlated with
      genome size for all grass genomes
48
Q

Brachy Genome Compared to Other Grass
Genomes - Conservation of Gene Families = 7

A
  1. 77% - 84% of gene families found in rice, sorghum and Brachy are shared
    1. *reflects relatively recent common origin
  2. Lineage-specific genes
  3. *genes for which no orthologue can be found in related species
  4. *taxonomic levels = grass, grass subfamily (Pooid), Brachy
  5. *obvious targets for functional analyses
  6. *may be involved in distinguishing taxa
49
Q

Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny

‘In Brachy: six major duplications of chromosomal regions’ = 4

A
  1. In Brachy: six major duplications of chromosomal regions
  2. *covering 92% of the genome
  3. *originated from the ancient WGD event before grass families diverged
  4. *creation of paralogues
50
Q

Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny:

‘Conserved synteny between Brachy, rice,
sorghum and wheat’ = 5

A
  1. Conserved synteny between Brachy, rice, sorghum and wheat
    1. *59 blocks of collinear orthologous
      genes
  2. *covering 99% of the Brachy genome
  3. *provide a framework for
    understanding grass genome evolution
  4. *aid the assembly of sequences from
    other related grasses
51
Q

brachy Genome Compared to Other Grass
Genomes - Conservation of Gene Families Diagram

A

slide 35

52
Q

Brachy Genome Compared to Other Grass
Genomes - Conservation of Synteny diagram

A

slide 36