Week 1 => Intro/origins of genes and genomes Flashcards
5 primary tools of the ‘omics’ revolution?
Genomics, transcriptomics, metagenomics, proteomics, metabolomics
Genomics
- Sequencing genomes of cultures or samples
- Re-sequencing
Transcriptomics
- Traditional approach => ESTs
- HAs evolved to RNA-seq technology
Metagenomics
- Culture-independent sequencing
- DNA isolated from a community of organisms, shotgun sequenced, and assembled
- Goal: to gain insight into the composition of the microbial community
Proteomics
- Generally refers to large-scale experimental analysis of proteins in complex mixtures, but smaller partially purified samples can be examined
- large-scale studies combine fragmentation of proteins, separation of proteins by liquid chromatography and detection by tandem mass spectrometry
- required a reference genomes with predicted proteins (digested in silico compared to predication)
Metabolomics
- Study of small-molecule metabolite profiles using gas chromatography, high Performance Liquid Chromatography (HPLC), etc., for sample separation, and mass spectrometry for detection
Can provide insight into the physiology of the cell as the time the sample was taken
Why is the genome of an individual not stable?
1) Copy number variation (CNV) between and WITHIN individuals (e.g., in the brain)
2) Cancer
Simplified order of gene concept evolution
1) Mendel determined basic rules of inheritance with discrete units passed between generations
2) Wilhelm Johansen coined word ‘gene’
3) Morgan’s work with gene placement of chromosomes => genes as beads on a string
4) Beadle and Tatum introduced the concept that one gene makes one enzyme
5) Oswald, Avery, MacLeod, Maclyn, and McCarty found that genes are made of DNA
6) Watson and Crick determine chemical structure of DNA (central dogma of molecules biology emerges from this)
7) Roberts and Sharp discover gene splicing (splicing or introns and exons)
8) The firs microRNA is identified in the worm Caenorhabditis elegans
9) GeneSweep: human geneticists come up with a definition for protein-coding genes in order to decide on a winner for a bet in the number of human genes
10) the idea that human genes are one lone continuum begins to emerge
What is the central dogma of molecular biology?
Information flows from DNA to RNA to protein
The ENCODE project
Encyclopedia of DNA Elements => goal to build a comprehensive parts list of FUNCTIONAL elements in the humans genome
do species have a genome?
Yes and no: there is overlapping and unique elements to each genome of a particular species
Pan-genome
Is the complete set of genes in a species (is also includes accessory genome, which is made up of gens that are present in some but not all members of the species)
Core-genome
IS the subset of gens shared by all members of that species
Re-sequencing
determining the sequence of a genome for the purpose of comparison to a reference genome
ESTs
expressed sequence tags - cDNAs derived from mRNA
RNA-seq technology
whole transcriptome shotgun sequencing (uses next-generation technologies to provide a comprehensive picture of the RNA present in a sample)
In silico
refers to scientific experiments conducted using computer simulations or modeling
Where do genes and genomes come from?
The first(s) living things
In vitro
“in glass”
In vivo
“in life”
Reasoning for where genes and genomes come from
Through comparing extant (liking) and extinct (via fossils and indirect evidence), we infer that shared traits originated in a common ancestor
What group has the largest/most branches in the phylogenic tree?
Prokaryotes
The RNA world timeline (__billion years ago)
4.5 Formation of Earth
4.2 Stable hydrosphere
4.2-4.0 Prebiotic chemistry
4.2-4.0 Pre-RNA world
4.2-3.8 RNA world
4.2-3.6 First DNA/protein life
3.6-present Diversification of life
RNA with catalytic activity would have:
1) Replicated itself
2) Synthesized peptides
Why was compartmentalization important in the RNA world?
It served to bring metabolites in close proximity. It is hard to imagine complex metabolism evolving otherwise.
What is the only reasonable assumption why ribozyme is the vital protein-synthesizing machine?
RNA world preceded the DNA-protein we know today
Today what are most enzymes?
Proteins (20 amino acids character states, much greater potential for structural/functional diversity)
LUCA
last common ancestor
Based on protein/DNA sequence similarity, we can infer that the common ancestor of present-day life (LUCA) had:
- double-stranded DNA gnomes
- genes involved in “house-keeping functions” and “core metabolic functions”
House-keeping functions
Transcription, translation, DNA replication, protein folding and turnover, etc.
Core metabolic functions
amino acid metabolism, purine/pyrimidine biosynthesis, carbon metabolism
Where do new genes come from?
1) from pre-existing genes (either from within the genome (duplication) or acquired from another organisms by horizontal gene transfer (HGT))
2) From non-coding DNA (‘from scratch’)
Where are the different eukaryotic isoforms of HSP90 gene family that evolved by duplication?
cytosol and ER
Prevalence of gene duplication in all three domains of life
Refs between 65-69 between most with the expectation of homo sapiens at 11
Homologs
in evolutionary biology, genes/proteins that share common ancestory
Orthologs
Gene/proteins in different species that evolved from a common ancestral gene by speciation. Orthologous genes/proteins typically have the same function in the different species
Paralogs
Genes within a genome that are related to one another by gene duplication. Paralogs often evolve new functions
Orphan genes:
genes without obvious homologs in the genomes of other organisms (aka ORFans)
Do orphan genes evolved by gene duplication coupled with extreme sequence divergence?
Yes
do orphan genes arise from non-coding DNA?
Yes
Proto-gene
A dominant gene, or a hypothetical unit that may have given rise to life
De novo gene acquisition
Is the process by which new genes are created from DNA sequences that were not previously genes
Examples of de novo gene acquisition
- BSC4 in saccharomyces cerevisiae
- Pldi in Mus musculus
- CLLU1, C22ORF45, and DNAH10OS in humas
- NDF1 in saccharomyces cervisiae
- PLJ33706 in humans
Pldi
polymorphic derived intron-containing
Mus musculus
House mouse
Pldi Mus musculus
- Transcript arisen in a large intergenic region of the house mouse.
- The gene has three exons, shows alternative splicing
- Cryptic signals for transcript regulation and processing exist in intergenic regions and can become the basis for an evolution of a new functional gene
Gene birth
gain of transcription, gain of translation => proto-gene
Gene death
Loss of translation, loss of transcription => Pseudo-gene
What is a significant force in the evolution of cells and their genomes?
Gene duplication
What is the birth rate and fate if duplicate genes influenced by?
- Rate of duplication
- DNA recombination machinery
- structure of gene families
- Population size
- impact