Genomics - Irina Flashcards
What are ESTs and how are they obtained?
Expressed sequence tags (ESTs)
→ clones with reverse transcibed mRNA representing a specific physiological condition or tissue
→ produced high-throughput, single pass pipeline, leading to a large collection of sequence reads from 400-700bp
→ first thing to do when commencing a genomic project de novo
Which criteria should be fulfilled for the recognition of genes in a genome? Describe the principle of each of them.
- identifying ORFs: a series of AS triplets bounded by a start and a stop codon = open reading frame (ORF)
- codon bias: for most AS two or more codons are available in genetic code, some codons occur more frequently than others, many uncommon codons → gene may not be actively described
- homology search: gene identification using their similarity to known genes from other species, works only with conserved genes
- association with promotor elements: characteristic sequences upstream of ORF that match with known transcription factor-binding sites, shed light on physiological context in which the gene is expressed = phylogenetic footprinting
- match with transcript or protein sequences: large collection of cDNA sequences, ESTs, serial analysis of gene expression (SAGE) tags
Define the term “gene”!
gene = complete chromosomal segment responsible for making a functional product
includes structural and regulatory elements (promotor, terminator, transcription-binding sites, etc.)
What is an open reading frame?
a series of AS triplets bounded by a start and a stop codon
What is codon bias?
- For most AS two or more codons are available in genetic code. Some codons occur much more frequently than others.
- tRNA for different codons are differently abundant in the genome
- Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons
- If many uncommon codons are seen in an ORF if may indicate that the gene is not actively transcribed
What are synonymous and nonsynonymous mutations?
Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons.
Non-synonymous mutations are under **neutral selection. **
What is an orphan gene?
Every new genome comes with 20-60% previously unknown genes = orphan genes
What are transcription factors and what is their general function in the genome?
A transcription factor (sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the transcription of genetic information from **DNA to mRNA. **Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the **recruitment of RNA polymerase **to specific genes.
Name some databases which are in use for the naming and classification of genes.
KEGG: Kyoto Encyclopedia of Genes and Genomes
KOGs: Eukaryotic Orthologous Groups
the Gene Ontology
JGI: Joint Genome Institute
How does genome size correlate with the complexity of the organisms? What is C value paradox?
It does not → remarkable lack of correspondence between the genome size and the organisms complexity = C value paradox
Increase of genome size from viruses to prokaryotes
C value = picograms of the haploid genome per cell
Non-genic fraction of the DNA is responsible for the C value paradox, in eukaryotes 30-99% are non-coding DNA.
What is the C value, how is it calculated?
Biochemically or flow cytometry = picograms of the haploid genome per cell
G (genome size in nt) = 0.987 x 109 C
Explain what are gene number and genome size, what is the difference between them and how they are calculated or obtained?
Gene number = the number of chromosomal segments that are responsible for making a functional product. It has nothing to do with genome size.
G (genome size in nt) = 0.987 x 109 C
C…picograms of haploid genome per cell
Non-genic fraction of the DNA is responsible for the C value paradox.
In Eukaryotes 30-99% of the genome consist of non-coding DNA
– Repetitive sequences
– Mobile elements
– Intrones
– Intergenic spacers
– etc.
Why is genome size in organisms not proportional to their gene number?
In Eukaryotes 30-99% of the genome consist of non-coding DNA
– Repetitive sequences
– Mobile elements
– Intrones
– Intergenic spacers
– etc.
Briefly describe the types of repeated sequences in the human genome.
- simple sequence repeats
- variable number tandem repeat
- highly repeated sequences at centromeric and subtelomeric regions
- segmental duplications
- transposon-derived repeats
- retroviral-like elements
- transposons
Describe the ways how genomes get enlarged?
Global polyploidization
- global genome duplication: highly deleterious (cell division and meiosis)
- destroys the mechanisms of dosage compensation of X chromosomes
- triploid always leads to sterility
- even number of chromosomes may be mechanisms of evolution innovation
- common in plants, but rare in animals
Regional genome duplication
- leads to localized repeat sequences
- unequal crossing-over
Duplicative transpositions
- transposable elements (copy & paste, cut & paste)