Comparative genomics & Metagenomics Flashcards

1
Q

What is comperative genomics and what is the general motivation behind it?

A

The study of the relationship of genome structure and function across different biological species or strains.

It is done by studying evolution.

Motivation:
Transfer knowledge from and to simpler model organisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sanger sequencing?

A

Chain termination method: Marked dideoxynucleotides that will stop the strand synthesis

1977 first sequenced DNA genome of a Phage (small viral genomes that only encode 4-10 genes)

  • capillary-based, semi-automated
  • bottleneck: DNA fragments need to be cloned and amplified in bacteria
  • simultaneous electrophoresis in 96 or 384 independent capillaries

→ sets limits to parallelization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is next generation sequencing (NGS)?

A

aka deep-sequencing, high-throughput sequencing

  • 500 Mb – 600 Gb / sequencing run possible
  • major genome centers: 1’000 sequences per second
  • → trick: massively parallel cyclic-array sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are global advantages and disadvantages of NGS relative to Sanger?

A

Global advantages:

  • in vitro construction of sequencing library
  • in vitro clonal amplification
  • array-based sequencing → much higher degree of parallelism
    (hundreds of millions of sequencing reads)
  • array features are immobilized → can be enzymatically manipulated by a single reagent volume
  • lower costs for DNA sequencing (10 - 250 times cheaper)

Disadvantages:

  • short read-length (30 – 350 bp)
  • accuracy at least 10-fold lower than by Sanger sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is 3rd generation sequencing?

A

Single-molecule-sequencing without need to pause between read steps

Goals:

  • higher throughput
  • faster turnaround time (sequencing metazoan genomes in minutes)
  • longer read lengths
  • higher accuracy
  • small amount of starting material (theoretically one molecule needed)
  • low cost (< 100 $ for one human genome !!)

PacBio sequencing (SMRT sequencing):
Fluorescence-based detection of dNTP incorporation in real time

Nanopore sequencing:
change in current is depending on physical and chemical properties of molecule that passes through the nanopore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did the complete sequence of a human genome do and how was it achieved?

A
  • removes a 20-year-old barrier that has hidden 8% of the genome from sequence-based analysis
  • this 8% of the genome has not been overlooked because of a lack of importance but because of technological limitations
  • used PacBio HiFi and Oxford Nanopore ultralong-read sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why comparative genomics?

A
  • To understand the genomic basis of the present
    • Differences in lifestyle
      • pathogen vs. non-pathogen
      • obligate parasites vs. free-living
    • Host specificity
      • animals vs. plants, plant A vs. plant B, etc
    • In the case of pathogens: this understanding should help us in fighting disease
  • To understand the past
    • How organisms evolved to be what they are

–> Molecular phylogenetics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is molecular phylogenetics?

A

The use of molecular data to establish the relationship between species, organisms or gene families.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Homologues?

What two categories are there?

A

Sequences/genes that derive from a common ancestor-gene

Homology is an all or nothing relation: related genes are not (e.g.) 80% homologous, but 80% similar/identical

Categories:

Orthologous genes: homologs in different species derived by a speciation event

Paralogous genes: homologs in the same species derived by a duplication event
One paralogue of a pair often retains the ancestral gene function → the other is free evolve and adopt new functions

(thus homologous sequences have same evolutionary ancestor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Convergence?

A

Convergent evolution creates analogous structures that have similar sequence/form/function, but that were not present in the last common ancestor of those groups

Example:
Lysozyme c of different unrelated organisms evolved convergently. The fact that they all have to be functional in the acidic stomach milieu, resulted in a similar amino acids composition in the active site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is comparative genomics good for in Evolution?

A
  • neutral evolution is „fast“ → e.g. pseudogenes cannot be identified as such after relatively short period of time
  • thus whenever a sequence (DNA, RNA, protein) is conserved, one can conclude that an evolutionary pressure exists (functional constraints)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is comparative genomics good for in Function prediction?

A
  • conserved sequences indicate that these regions of the molecule are functionally important!
  • conserved nt or aa most often have similar functions in homologous protein, DNA or RNA molecules
  • with the help of comparative genomics one can predict the functions of molecules based on comparison with the already characterized homolog
  • Comparison of protein domains:
    • identification of a conserved protein domain and its comparison with homologous proteins can help in unraveling the protein function
  • statements about gene functions can be made on the genome-wide level
  • since it is very unlikely that one will be able to study all genes/gene products of a particular organism on the function/structural level
  • even for well studied organisms (such as E.coli, S. cerevisiae) we do not yet know role of every gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What did the Homology analysis of the yeast genome show?

A
  • 30% of all genes previously know
  • function of 30% of all genes could be assigned based on homology search
  • 10% of the genes had homologs in database; function unknown
  • 30% of all genes (23% +7%) showed ORFs, but lack homologs in database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does synteny mean originally and what is today’s meaning?

A

Synteny (original meaning):
gene loci are on the same chromosome within an individual or species

Conserved (shared) synteny (today’s meaning):

  • describes preserved co-localization of genes on chromosomes of different species
  • two or more genomic regions are derived from a single ancestral
  • genomic region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can genome alignment be visualized and how is it interpreted?

A

Pairwise alignment (dot plot)

  • Match chromosome sequence from species A to species B
  • If the sequences (gene order) were identical, we would see a straight line (identity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where do inversions happen most often?

A

Seem to happen around the origin or terminus of replication

17
Q

What is comparative genomics good for in diseases?

A
  • if the function of a disease causing gene is not known but a homolog in an other (micro)organism has been identified then function of the disease gene product can be deduced
  • Example: e.g.: Bloom‘s Syndrome
    • mutation in the gene causes growth defect in humans. The yeast homolog codes for a DNA helicase (which is involved in rRNA transcription and DNA replication)
  • NGS has revolutionized infectious-disease research
  • Bryant et al. (Science, 2016) sequenced 1’080 Mycobacterium abcessus isolates from 517 patients around the world
  • M. abcessus causes human disease in several tissues (e.g. lung; especially in cystic fibrosis patients)
  • Infections thought to be acquired exclusively from environment
18
Q

What did the Phylogenomic comparisons of sequencing data of three subspecies of M. abcessus reveal?

A
  • whole-genome analysis revealed unexpected similarity at genomic level from geographically diverse locations

→ Does not argue for environmental acquisition

  • So far unknown human-to-human transmission (via asymptotic carriers, long lived cough aerosols, or infected surfaces)?
  • Dominant clones had more mutations associated with drug resistance & correlated with poorer clinical outcomes
  • Such deep-sequencing-based genome comparisons can potentially capture snapshots of evolution (they also sequenced genomes of M. abcessus over time from the same patient)
19
Q

What is metagenomics and what questions can it help answer?

A
  • Sequencing of DNA from environmental samples
  • allows to study complex (e.g. microbial) communities
  • no cultivation of species needed

–> Allows you to sequence new organisms that can’t be cultivated in the lab

  • Information about : Biodiversity but also physiology, metabolic pathways…
    • Environmental sample → all DNA sequencing
    • Who is there ? Biodiversity characterization → new organisms
    • Who does what ? Physiological characterization → new genes
  • 90-95% microorganisms remain uncultivable in laboratory
  • -> Tremendous knowledge gap about biodiversity
  • at the last count 1.8 million species were known to science
  • •metagenomics is promising to change our view of life on Earth
  • expected that billions of life forms are out there we never knew existed
20
Q

What are the main priciple of metagenomics (workflow)?

A
  1. Sample collection
  2. Whole DNA extraction
  3. Whole DNA amplification
  4. Whole DNA sequencing
  5. Data analysis
21
Q

What problems does metagenomics have?

A
  • often fragmentary
  • often highly divergent
  • lack of reference genomes
  • no organism of origin
  • ab initio ORF predictions
  • huge data
22
Q

What is the Marine Genome Sequencing Project?

A

Measuring the genetic diversity of ocean microbes

  • almost 1,000 genomes for uncultivated microbes
  • 6.12 million new proteins uncovered
  • 1,700 totally unique large protein families (mainly viral)
23
Q

What is MetaHIT?

A

Metagenomic of the Human Intestinal Tract

  • Funded by European Commission, January 1, 2008 → lasted for 4 years, Scientists of 8 countries
  • determine whether a “core” (standard) human microbiome exists
  • establish associations between the genes of the human intestinal microbiota and human health and disease
  • faecal samples from 124 human adults
  • healthy individuals
  • sick individuals
    • Inflammatory Bowel Disease (Crohn’s disease, Ulcerative colitis)
    • obesity
  • determined a total of 576.7 Gb of DNA sequence prepared from stool samples (an average of 4.5 GB of sequence was generated for each sample)
  • 3.3 million different microbial genes in the gut of the individuals (150-fold more than in our own genome)
  • each individual carries 536,000 microbial genes
    → ~160 microbial species
  • in total: 1’000 -1’150 bacterial species found in the 124 individuals
  • Even for the most common 57 species present in > 90% of individuals, the inter-individual variability was between 12- and 2’187-fold

The expected final achievements of the project should be the discovery of associations between bacterial genes and human disease → preventive and personalized medicine

24
Q

What does the gut microbiome affect?

A

Affects childhood growth:

  • childhood undernutrition accounts for ~ 50% of all deaths in infants under the age of five worldwide
  • childhood malnutrition has been associated with an altered microbiota
  • modifying the gut microbial communities in mice can alleviate diet-associated growth deficits
  • It is becoming increasingly apparent that our diet, gut microbiota and health are inextricably linked.

Gut microbiota influences obesity:

  • The gut microbiota co-develops with the host (rats on a high fat diet) and modulates whole-body metabolism by affecting energy balance
  • acetate molecules from dietary nutrients by the gut microbiota signals to the brain
  • triggers secretion of the ‘hunger hormone’ ghrelin from stomach → increased food intake
  • also potentiates glucose-stimulated insulin secretion from β-cells in the pancreas, promoting calorie storage and fat gain
  • mechanistic link between onset of obesity and the gut microbiome

Gut microbiota regulate neuronal function and fear extinction learning:

  • Single-nucleus RNA-Seq revealed changes in microglia and neurons related to synapse organization & assembly
  • Metabolomics revealed 4 metabolites to be down in germ-free mice:
    • phenyl sulfate
    • pyrocatechol sulfate
    • 3-(3-sulfooxyphenyl)propanoic acid
    • indoxyl sulfate
  • → new insight into the co-evolved relationship between the microbiota, the nervous system and mammalian behavior
25
Q

What did metagenomics tell us about neanderthals?

A
  • bacteria collected from Neanderthal teeth shows that our close cousins ate so many roots, nuts, or other starchy foods that they dramatically altered the type of bacteria in their mouths.
  • ancestors of both humans and Neanderthals were cooking lots of starchy foods at least 600,000 years ago. (long before the invention of agriculture 10,000 years ago)
  • analyzed 124 dental biofilm metagenomes (humans, including Neanderthals, Late Pleistocene to present-day modern humans, chimpanzees, gorillas, New World howler monkeys)
  • core microbiome of primarily biofilm structural taxa has been maintained throughout African hominid evolution
  • microbial profiles of both Neanderthals and modern humans are highly similar (but very different from chimpanzees)
  • evidence of shared genetic diversity in the oral bacteria of Neanderthal and Upper Paleolithic modern humans (older than 14 ka) that is not observed in later modern human populations
  • supporting evidence for earlier admixture and interaction in Ice Age Europe
  • amylase binding (by oral streptococci) is an apparent Homo-specific trait (evidence for starch-rich foods in early Homo evolution > 600 ka ago)
26
Q

How did the view of the tree of life change and through what?

A
  • use new genomic data from 1,011 uncultivated and little known organisms, together with published sequences
  • dramatically expanded version of the tree of life
  • results reveal the dominance of bacterial diversification
  • Contributing to this expansion in genome numbers are single cell genomics and metagenomics studies