Comparative genomics & Metagenomics Flashcards
What is comperative genomics and what is the general motivation behind it?
The study of the relationship of genome structure and function across different biological species or strains.
It is done by studying evolution.
Motivation:
Transfer knowledge from and to simpler model organisms
What is sanger sequencing?
Chain termination method: Marked dideoxynucleotides that will stop the strand synthesis
1977 first sequenced DNA genome of a Phage (small viral genomes that only encode 4-10 genes)
- capillary-based, semi-automated
- bottleneck: DNA fragments need to be cloned and amplified in bacteria
- simultaneous electrophoresis in 96 or 384 independent capillaries
→ sets limits to parallelization
What is next generation sequencing (NGS)?
aka deep-sequencing, high-throughput sequencing
- 500 Mb – 600 Gb / sequencing run possible
- major genome centers: 1’000 sequences per second
- → trick: massively parallel cyclic-array sequencing
What are global advantages and disadvantages of NGS relative to Sanger?
Global advantages:
- in vitro construction of sequencing library
- in vitro clonal amplification
- array-based sequencing → much higher degree of parallelism
(hundreds of millions of sequencing reads) - array features are immobilized → can be enzymatically manipulated by a single reagent volume
- lower costs for DNA sequencing (10 - 250 times cheaper)
Disadvantages:
- short read-length (30 – 350 bp)
- accuracy at least 10-fold lower than by Sanger sequencing
What is 3rd generation sequencing?
Single-molecule-sequencing without need to pause between read steps
Goals:
- higher throughput
- faster turnaround time (sequencing metazoan genomes in minutes)
- longer read lengths
- higher accuracy
- small amount of starting material (theoretically one molecule needed)
- low cost (< 100 $ for one human genome !!)
PacBio sequencing (SMRT sequencing):
Fluorescence-based detection of dNTP incorporation in real time
Nanopore sequencing:
change in current is depending on physical and chemical properties of molecule that passes through the nanopore
What did the complete sequence of a human genome do and how was it achieved?
- removes a 20-year-old barrier that has hidden 8% of the genome from sequence-based analysis
- this 8% of the genome has not been overlooked because of a lack of importance but because of technological limitations
- used PacBio HiFi and Oxford Nanopore ultralong-read sequencing
Why comparative genomics?
- To understand the genomic basis of the present
- Differences in lifestyle
- pathogen vs. non-pathogen
- obligate parasites vs. free-living
- Host specificity
- animals vs. plants, plant A vs. plant B, etc
- In the case of pathogens: this understanding should help us in fighting disease
- Differences in lifestyle
- To understand the past
- How organisms evolved to be what they are
–> Molecular phylogenetics
What is molecular phylogenetics?
The use of molecular data to establish the relationship between species, organisms or gene families.
What are Homologues?
What two categories are there?
Sequences/genes that derive from a common ancestor-gene
Homology is an all or nothing relation: related genes are not (e.g.) 80% homologous, but 80% similar/identical
Categories:
Orthologous genes: homologs in different species derived by a speciation event
Paralogous genes: homologs in the same species derived by a duplication event
One paralogue of a pair often retains the ancestral gene function → the other is free evolve and adopt new functions
(thus homologous sequences have same evolutionary ancestor)
What is Convergence?
Convergent evolution creates analogous structures that have similar sequence/form/function, but that were not present in the last common ancestor of those groups
Example:
Lysozyme c of different unrelated organisms evolved convergently. The fact that they all have to be functional in the acidic stomach milieu, resulted in a similar amino acids composition in the active site.
What is comparative genomics good for in Evolution?
- neutral evolution is „fast“ → e.g. pseudogenes cannot be identified as such after relatively short period of time
- thus whenever a sequence (DNA, RNA, protein) is conserved, one can conclude that an evolutionary pressure exists (functional constraints)
What is comparative genomics good for in Function prediction?
- conserved sequences indicate that these regions of the molecule are functionally important!
- conserved nt or aa most often have similar functions in homologous protein, DNA or RNA molecules
- with the help of comparative genomics one can predict the functions of molecules based on comparison with the already characterized homolog
- Comparison of protein domains:
- identification of a conserved protein domain and its comparison with homologous proteins can help in unraveling the protein function
- statements about gene functions can be made on the genome-wide level
- since it is very unlikely that one will be able to study all genes/gene products of a particular organism on the function/structural level
- even for well studied organisms (such as E.coli, S. cerevisiae) we do not yet know role of every gene
What did the Homology analysis of the yeast genome show?
- 30% of all genes previously know
- function of 30% of all genes could be assigned based on homology search
- 10% of the genes had homologs in database; function unknown
- 30% of all genes (23% +7%) showed ORFs, but lack homologs in database
What does synteny mean originally and what is today’s meaning?
Synteny (original meaning):
gene loci are on the same chromosome within an individual or species
Conserved (shared) synteny (today’s meaning):
- describes preserved co-localization of genes on chromosomes of different species
- two or more genomic regions are derived from a single ancestral
- genomic region
How can genome alignment be visualized and how is it interpreted?
Pairwise alignment (dot plot)
- Match chromosome sequence from species A to species B
- If the sequences (gene order) were identical, we would see a straight line (identity)