Genomics - Irina Flashcards

Question 1

Q

What are ESTs and how are they obtained?

Answer

A

Expressed sequence tags (ESTs)

→ clones with reverse transcibed mRNA representing a specific physiological condition or tissue

→ produced high-throughput, single pass pipeline, leading to a large collection of sequence reads from 400-700bp

→ first thing to do when commencing a genomic project de novo

Question 2

Q

Which criteria should be fulfilled for the recognition of genes in a genome? Describe the principle of each of them.

Answer

A

identifying ORFs: a series of AS triplets bounded by a start and a stop codon = open reading frame (ORF)
codon bias: for most AS two or more codons are available in genetic code, some codons occur more frequently than others, many uncommon codons → gene may not be actively described
homology search: gene identification using their similarity to known genes from other species, works only with conserved genes
association with promotor elements: characteristic sequences upstream of ORF that match with known transcription factor-binding sites, shed light on physiological context in which the gene is expressed = phylogenetic footprinting
match with transcript or protein sequences: large collection of cDNA sequences, ESTs, serial analysis of gene expression (SAGE) tags

Question 3

Q

Define the term “gene”!

Answer

A

gene = complete chromosomal segment responsible for making a functional product

includes structural and regulatory elements (promotor, terminator, transcription-binding sites, etc.)

Question 4

Q

What is an open reading frame?

Answer

A

a series of AS triplets bounded by a start and a stop codon

Question 5

Q

What is codon bias?

Answer

A

For most AS two or more codons are available in genetic code. Some codons occur much more frequently than others.
tRNA for different codons are differently abundant in the genome
Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons
If many uncommon codons are seen in an ORF if may indicate that the gene is not actively transcribed

Question 6

Q

What are synonymous and nonsynonymous mutations?

Answer

A

Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons.

Non-synonymous mutations are under **neutral selection. **

Question 7

Q

What is an orphan gene?

Answer

A

Every new genome comes with 20-60% previously unknown genes = orphan genes

Question 8

Q

What are transcription factors and what is their general function in the genome?

Answer

A

A transcription factor (sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the transcription of genetic information from **DNA to mRNA. **Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the **recruitment of RNA polymerase **to specific genes.

Question 9

Q

Name some databases which are in use for the naming and classification of genes.

Answer

A

KEGG: Kyoto Encyclopedia of Genes and Genomes

KOGs: Eukaryotic Orthologous Groups

the Gene Ontology

JGI: Joint Genome Institute

Question 10

Q

How does genome size correlate with the complexity of the organisms? What is C value paradox?

Answer

A

It does not → remarkable lack of correspondence between the genome size and the organisms complexity = C value paradox

Increase of genome size from viruses to prokaryotes

C value = picograms of the haploid genome per cell

Non-genic fraction of the DNA is responsible for the C value paradox, in eukaryotes 30-99% are non-coding DNA.

Question 11

Q

What is the C value, how is it calculated?

Answer

A

Biochemically or flow cytometry = picograms of the haploid genome per cell

G (genome size in nt) = 0.987 x 10⁹ C

Question 12

Q

Explain what are gene number and genome size, what is the difference between them and how they are calculated or obtained?

Answer

A

Gene number = the number of chromosomal segments that are responsible for making a functional product. It has nothing to do with genome size.

G (genome size in nt) = 0.987 x 10⁹ C

C…picograms of haploid genome per cell

Non-genic fraction of the DNA is responsible for the C value paradox.

In Eukaryotes 30-99% of the genome consist of non-coding DNA

– Repetitive sequences

– Mobile elements

– Intrones

– Intergenic spacers

– etc.

Question 13

Q

Why is genome size in organisms not proportional to their gene number?

Answer

A

In Eukaryotes 30-99% of the genome consist of non-coding DNA

– Repetitive sequences

– Mobile elements

– Intrones

– Intergenic spacers

– etc.

Question 14

Q

Briefly describe the types of repeated sequences in the human genome.

Answer

A

simple sequence repeats
variable number tandem repeat
highly repeated sequences at centromeric and subtelomeric regions
segmental duplications
transposon-derived repeats
retroviral-like elements
transposons

Question 15

Q

Describe the ways how genomes get enlarged?

Answer

A

Global polyploidization

global genome duplication: highly deleterious (cell division and meiosis)
destroys the mechanisms of dosage compensation of X chromosomes
triploid always leads to sterility
even number of chromosomes may be mechanisms of evolution innovation
common in plants, but rare in animals

Regional genome duplication

leads to localized repeat sequences
unequal crossing-over

Duplicative transpositions

transposable elements (copy & paste, cut & paste)

Question 16

Q

What is polyploidy? Name some examples among animals and plants.

Answer

A

Organisms or cells that have more than three “sets” of chromosomes are termed polyploidic.

Global genome duplication: highly deleterious (cell division and meiosis)

Destroys the mechanism of dosage compensation of X chromosomes
Triploid always leads to sterility
Even number of chromosomes may be mechanism of evolution innovation
Common in plants, but rare in animals: Brassica napus has 19 sets, certain frogs are triploidic, rodents tetraploidic (Viscacharatte).

Question 17

Q

What is a gene family?

Answer

A

Genes are categorized into families based on shared nucleotide or protein sequences, but also on protein secondary structures.

→ Phylogenomics

prediction of gene function
establishment and clarification of evolutionary relationships
prediction and retracing of lateral gene transfer

If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family (e.g. Pfam, PROSITE, PIRSF, PASS2, SUPERFAMILY, SCOP & CATH)

Question 18

Q

Explain the possibilities of gene functionalization in case of gene duplication.

Answer

A

A large numbers of genes is similar to each other due to their common descent from a duplication event = paralogous genes.

subfunctionalization
nonfunctionalization
superfunctionalization
neofunctionalization

Question 19

Q

Explain the differences between the paralogous and orthologous genes.

Answer

A

Paralogous genes = large number of genes similar to each other due to a common descent from a duplication event

Orthologous genes = genes in different species sharing a common ancestor

Question 20

Q

Explain the term phylogenomics and for what is it useful?

Answer

A

Genes are categorized into families based on shared nucleotide or protein sequences, but also on protein secondary structure.

Phylogenetic techniques can be used for:

Prediction of gene function
Establishment and clarification of evolutionary relationships
Prediction and retracing lateral gene transfer.

Question 21

Q

What is the GC content? Where in the genomes the deviations from random occurrence happen? How the GC content correlate with the complexity of organisms? Draw a CpG island.

Answer

A

% of GC bp over AT bp in a genetic fragment (gene, locus, non-coding region, chromosomes) or across species.
A, T, G and C are not distributed randomly in DNA
Disregarding the DNA itself, deviations from random occurrence:

– In coding regions higher than in flanking regions of the gene

– 5’ flanking regions richer than 3’ (promotor)

– Biased over long stretches of DNA (ca. 300 kb)

GC-rich dinucleotide stretches of DNA of at least 200 bp = CpG islands
found to be variable with different organisms (Variation in selection, mutational bias and bias in recombination-associatied DNA repair)

Question 22

Q

What is synteny? Explain the term on one example.

Answer

A

Genetics: two loci located on the same chromosome
Genomics: a series of genes is arranged in the same order on different genomes
Passarge et al. (1999): Colinearity

Question 23

Q

How is the gene order of certain gene maintained during the genome recombination?

Answer

A

Selective pressure acting upon the cluster as an integrated whole.
Coherent temporal expression such as in Hox genes.
Single locus-control region controls expression of a group of genes (by movement of severe selective disadvantages, e.g. beta-globin cluster)
Interdigitization of regulatory elements = regulatory elements might be physically linked to genes close by e.g. in intrones

Question 24

Q

Explain diverse applications of genome sequencing.

Answer

A

medicine
microbes for energy and environment
bioanthropology
agriculture, livestock, breeding, bioprocessing
DNA identification

Question 25

Q

List some facts about the human genome.

Answer

A

3 billion basepairs
human genome is 99.9% the same in all people
only about 2% of the human genome contains genes, which are instructions for making proteins
humans have an estimated 30.000 genes, the functions of more than of them is unknown
almost half of all human proteins share similarities with those of other organisms, underscoring the unity of life

Question 26

Q

Explain how you would obtain a sequence of a gene for which you don’t have the genome? Explain several approaches.

Answer

A

Hierarchical sequencing of a genome

construction of whole-genome clone library
chromosome libraries
cDNA clone libraries

Shotgun sequencing
Hierarchical shotgun sequencing for large genomes