VL 16 (George Soultoukis) Flashcards
Important Key terms
Genome:
complete set of genes, complete set of DNA in an organism (or a virus)
Nuclear genome:
DNA of the nuclei (genomic DNA)
Organellar genomes:
DNA of the organelles (mitochondrial DNA, chloroplasts/plastid DNA)
Transcriptome:
complete set of transcripts
–> In a particular organism, organ, tissue, cell, developmental or physiological condition
–> Can include mRNA, tRNA, rRNA, microRNA, siRNA, circRNA, …
Proteome:
complete set of proteins, peptides
Metabolome:
the complete set of metabolites
Epigenome:
the entirety of DNA-regulatory features
Central dogma of molecular biology: DNA –> RNA –> Protein
How did genomes start?
- Abiogenesis:
the natural process of transition from non-living matter to living entities - DNA world hypothesis (single strand, double strand)
- RNA world hypothesis (RNAs as enzymes, panspermia)
- Protein first hypothesis (amino acid abundance)
- Cell evolution: early earth biophysics create conditions for protocell emergence lipid bilayer enclosing organic molecules and macromolecules
- DNA evolution from first sequence to cell-dividing template
- Genome complexity: various regulatory levels of metabolism linked to health and disease (methylome, transcriptome, proteome, metabolome,…)
- Modern biological research aims to understand molecular events that evolved over >4 billion years
What is Genome mapping?
-
Mapping a genome:
determine the location of genes in a genome -
Linkage map:
shows the distance between loci in units based on recombination frequencies - Phenotype traits (visible)
- Traits detected by biochemical/molecular methods
- Restriction map: constructed by cutting the DNA with restriction enzymes
- Today, genomic maps (physical mapping) are constructed by sequencing the DNA using next-generation sequencing (NGS) technologies: SBS and bridge amplification.
What is genetic polymorphism
- the co-existence of multiple alleles at a locus (typically across individuals)
- Generally, a locus is defined as polymorphic if two or more alleles are present at a frequency of more than 1% in the population (e.g. for human eye color).
What is SNP?
SNP= single nucleotide polymorphism
- In the human genome: on average 1 SNP per 1330 bases
- i.e., ca. 10 Mio. SNPs in a human genome are polymorphic (i.e., they occur at a frequency
of more than 1% each) - SNPs can often be associated with genetic disorders
- Can be used in a diagnostic test to determine whether the individual has (with very probability) the genetic disorder if the respective gene is not well defined molecularly yet.
Haplotype, GWAS
- Each individual has a unique set of SNPs
- Haplotype = particular combination of SNPs in a particular region of the genome (thus, a haplotype represents only a part of the genome)
- An important question is: which of the many SNPs in the human genome are associated with genetic disorders?
- Similarly in plant breeding: which of the many SNPs in a crop genome are associated with traits (such as crop yield, pathogen resistance, abiotic stress tolerance)?
- A method to identify such SNPs is called “genome-wide association study” = GWAS
- In GWAS, the entire genomes of e.g. healthy people and patients are scanned for SNPs, and SNPs associated with the disease are identified
The eukaryotiv genome
- Eukaryotic genomes have non-repetitive and repetitive sequences
- Non-repetitive DNA: sequences that are unique: i.e., only one copy in the haploid genome
- There are different types of repetitive sequences
- Moderately repetitive DNA: 10 – 1000 times repetition of relatively short sequences, dispersed throughout the genome; have a high percentage of transposons (up to 5 kb in length, movable in the genome)
- Highly repetitive DNA: very short sequences, typically shorter than 100 bp; present many thousands of times
- Animals: up to 50% of the nuclear DNA is repetitive
- Plants, amphibians: 80% of the nuclear DNA tends to be repetitive
How to identify protein–coding genes in a genome?
- ORF = open reading frame
Polypeptide-encoding sequence:
* Must have an ORF:
start with ATG, end with a stop codon (TAA, TAG, TGA), and have a number of nucleotides in between that is dividable by 3 (the number of nucleotides per amino acid-encoding triplet)
- Typically there are similar sequences in other (already sequenced) genomes
- Predicting an ORF may be complicated by the fact that eukaryotic genes often have exons and introns.
- Pseudogenes:
distinction to functional genes can be sometimes unclear. Pseudogenes are still being identified and analysed, and new regulatory elements are still being discovered. - Computationally derived ORFs (start and end codon presence)
- Homologies to other known or predicted genes
Not only the nucleus contains DNA, but also the organelles (mitochondria, chloroplasts)
- This results in non-Mendelian inheritance
–> Mendelian inheritance typically refers to inheritance of the nuclear genome -
Uni-parental inheritance:
an extreme form of non-Mendelian inheritance; here, the genome of only one parent is inherited -
Maternal inheritance:
genome of the mother is mostly inherited to the offspring (in plants and animals)
What is LHON
LHON = Leber´s Heredity Optic Neuropathy
- Inherited only by mothers to their offspring (1:25,000)
- Due to a mutation in the mitochondrial gene that encodes an NADH dehydrogenase subunit; leads to a degeneration of retinal ganglion cells and their axons; problem with energy household in mitochondria
- only the egg contributes mitochondria to the embryo; the egg comes from the mother
- Leads to sudden loss of vision in young adults, typically in both eyes
Organellar genomes
-
mtDNA: DNA of the mitochondria
–> vary in size by an order of magnitude; animals: ca. 16.6 kb; yeast: ca. 80 kb, i.e. much bigger -
cpDNA: DNA of the chloroplasts; sometimes also called ctDNA
–> 120 – 217 kb (largest in geranium); with 87 – 183 genes (i.e., typically more genes than encoded by mitochondrial genomes)
–> 4 rRNAs, 30 tRNAs, ca. 60 proteins (several of which are photosynthetic proteins, e.g. thylakoid proteins) - These organellar genomes are often circular
- Typically:
several copies of the genome in the individual organelle; and: there are multiple organelles in a cell; therefore: multiple organelle genomes per cell!
Genome sequence: number of genes in prokaryotes
- Mycoplasma genitalium:
parasitic bacterium without a cell wall; 470 genes - Genomes of free-living bacteria: 1700 – 7500 genes
- Archaea: 1,500 – 2,700 genes
Overall: poor correlation between gene number and nuclear genome size
Genome sequences: number of genes in eukaryotes
- Smallest uni-cellular eukaryotic genomes: 5,300 genes
- Nematodes: 21,700 genes
- Fruit flies: 17,000 genes
- Arabidopsis thaliana: 25,000 genes
- Some crops: 30,000 – 50,000 genes
- Homo sapiens: 20,000 genes
Correlation between gene numbers and genome size not very strict in eukaryotes
What are Gene families
- In more complex/larger genomes: more gene families, less unique genes
Gene family:
–> gene members of a family are related to each other at the sequence (and functional) level
What are homologous genes?
- Homologous genes: common gene ancestor
- Paralogous (in): mutated duplicates in the same genome
- Orthologous: same ancestor gene in different organisms
- Paralogous (out): mutated duplicates in different organisms