Genomics - Irina Flashcards

1
Q

What are ESTs and how are they obtained?

A

Expressed sequence tags (ESTs)

clones with reverse transcibed mRNA representing a specific physiological condition or tissue

→ produced high-throughput, single pass pipeline, leading to a large collection of sequence reads from 400-700bp

first thing to do when commencing a genomic project de novo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which criteria should be fulfilled for the recognition of genes in a genome? Describe the principle of each of them.

A
  • identifying ORFs: a series of AS triplets bounded by a start and a stop codon = open reading frame (ORF)
  • codon bias: for most AS two or more codons are available in genetic code, some codons occur more frequently than others, many uncommon codons → gene may not be actively described
  • homology search: gene identification using their similarity to known genes from other species, works only with conserved genes
  • association with promotor elements: characteristic sequences upstream of ORF that match with known transcription factor-binding sites, shed light on physiological context in which the gene is expressed = phylogenetic footprinting
  • match with transcript or protein sequences: large collection of cDNA sequences, ESTs, serial analysis of gene expression (SAGE) tags
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the term “gene”!

A

gene = complete chromosomal segment responsible for making a functional product

includes structural and regulatory elements (promotor, terminator, transcription-binding sites, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an open reading frame?

A

a series of AS triplets bounded by a start and a stop codon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is codon bias?

A
  • For most AS two or more codons are available in genetic code. Some codons occur much more frequently than others.
  • tRNA for different codons are differently abundant in the genome
  • Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons
  • If many uncommon codons are seen in an ORF if may indicate that the gene is not actively transcribed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are synonymous and nonsynonymous mutations?

A

Synonymous mutations are not under neutral selection – selective pressure for the use of preferred codons.

Non-synonymous mutations are under **neutral selection. **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an orphan gene?

A

Every new genome comes with 20-60% previously unknown genes = orphan genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are transcription factors and what is their general function in the genome?

A

A transcription factor (sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the transcription of genetic information from **DNA to mRNA. **Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the **recruitment of RNA polymerase **to specific genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name some databases which are in use for the naming and classification of genes.

A

KEGG: Kyoto Encyclopedia of Genes and Genomes

KOGs: Eukaryotic Orthologous Groups

the Gene Ontology

JGI: Joint Genome Institute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does genome size correlate with the complexity of the organisms? What is C value paradox?

A

It does not → remarkable lack of correspondence between the genome size and the organisms complexity = C value paradox

Increase of genome size from viruses to prokaryotes

C value = picograms of the haploid genome per cell

Non-genic fraction of the DNA is responsible for the C value paradox, in eukaryotes 30-99% are non-coding DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the C value, how is it calculated?

A

Biochemically or flow cytometry = picograms of the haploid genome per cell

G (genome size in nt) = 0.987 x 109 C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain what are gene number and genome size, what is the difference between them and how they are calculated or obtained?

A

Gene number = the number of chromosomal segments that are responsible for making a functional product. It has nothing to do with genome size.

G (genome size in nt) = 0.987 x 109 C

C…picograms of haploid genome per cell

Non-genic fraction of the DNA is responsible for the C value paradox.

In Eukaryotes 30-99% of the genome consist of non-coding DNA

– Repetitive sequences

– Mobile elements

– Intrones

– Intergenic spacers

– etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is genome size in organisms not proportional to their gene number?

A

In Eukaryotes 30-99% of the genome consist of non-coding DNA

– Repetitive sequences

– Mobile elements

– Intrones

– Intergenic spacers

– etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Briefly describe the types of repeated sequences in the human genome.

A
  • simple sequence repeats
  • variable number tandem repeat
  • highly repeated sequences at centromeric and subtelomeric regions
  • segmental duplications
  • transposon-derived repeats
  • retroviral-like elements
  • transposons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the ways how genomes get enlarged?

A

Global polyploidization

  • global genome duplication: highly deleterious (cell division and meiosis)
  • destroys the mechanisms of dosage compensation of X chromosomes
  • triploid always leads to sterility
  • even number of chromosomes may be mechanisms of evolution innovation
  • common in plants, but rare in animals

Regional genome duplication

  • leads to localized repeat sequences
  • unequal crossing-over

Duplicative transpositions

  • transposable elements (copy & paste, cut & paste)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is polyploidy? Name some examples among animals and plants.

A

Organisms or cells that have more than three “sets” of chromosomes are termed polyploidic.

Global genome duplication: highly deleterious (cell division and meiosis)

  • Destroys the mechanism of dosage compensation of X chromosomes
  • Triploid always leads to sterility
  • Even number of chromosomes may be mechanism of evolution innovation
  • Common in plants, but rare in animals: Brassica napus has 19 sets, certain frogs are triploidic, rodents tetraploidic (Viscacharatte).
17
Q

What is a gene family?

A

Genes are categorized into families based on shared nucleotide or protein sequences, but also on protein secondary structures.

→ Phylogenomics

  • prediction of gene function
  • establishment and clarification of evolutionary relationships
  • prediction and retracing of lateral gene transfer

If the genes of a gene family encode proteins, the term protein family is often used in an analogous manner to gene family (e.g. Pfam, PROSITE, PIRSF, PASS2, SUPERFAMILY, SCOP & CATH)

18
Q

Explain the possibilities of gene functionalization in case of gene duplication.

A

A large numbers of genes is similar to each other due to their common descent from a duplication event = paralogous genes.

  • subfunctionalization
  • nonfunctionalization
  • superfunctionalization
  • neofunctionalization
19
Q

Explain the differences between the paralogous and orthologous genes.

A

Paralogous genes = large number of genes similar to each other due to a common descent from a duplication event

Orthologous genes = genes in different species sharing a common ancestor

20
Q

Explain the term phylogenomics and for what is it useful?

A

Genes are categorized into families based on shared nucleotide or protein sequences, but also on protein secondary structure.

Phylogenetic techniques can be used for:

  • Prediction of gene function
  • Establishment and clarification of evolutionary relationships
  • Prediction and retracing lateral gene transfer.
21
Q

What is the GC content? Where in the genomes the deviations from random occurrence happen? How the GC content correlate with the complexity of organisms? Draw a CpG island.

A
  • % of GC bp over AT bp in a genetic fragment (gene, locus, non-coding region, chromosomes) or across species.
  • A, T, G and C are not distributed randomly in DNA
  • Disregarding the DNA itself, deviations from random occurrence:

– In coding regions higher than in flanking regions of the gene

5’ flanking regions richer than 3’ (promotor)

– Biased over long stretches of DNA (ca. 300 kb)

  • GC-rich dinucleotide stretches of DNA of at least 200 bp = CpG islands
  • found to be variable with different organisms (Variation in selection, mutational bias and bias in recombination-associatied DNA repair)
22
Q

What is synteny? Explain the term on one example.

A
  • Genetics: two loci located on the same chromosome
  • Genomics: a series of genes is arranged in the same order on different genomes
  • Passarge et al. (1999): Colinearity
23
Q

How is the gene order of certain gene maintained during the genome recombination?

A
  • Selective pressure acting upon the cluster as an integrated whole.
  • Coherent temporal expression such as in Hox genes.
  • Single locus-control region controls expression of a group of genes (by movement of severe selective disadvantages, e.g. beta-globin cluster)
  • Interdigitization of regulatory elements = regulatory elements might be physically linked to genes close by e.g. in intrones
24
Q

Explain diverse applications of genome sequencing.

A
  • medicine
  • microbes for energy and environment
  • bioanthropology
  • agriculture, livestock, breeding, bioprocessing
  • DNA identification
25
Q

List some facts about the human genome.

A
  • 3 billion basepairs
  • human genome is 99.9% the same in all people
  • only about 2% of the human genome contains genes, which are instructions for making proteins
  • humans have an estimated 30.000 genes, the functions of more than of them is unknown
  • almost half of all human proteins share similarities with those of other organisms, underscoring the unity of life
26
Q

Explain how you would obtain a sequence of a gene for which you don’t have the genome? Explain several approaches.

A
  1. Hierarchical sequencing of a genome
  • construction of whole-genome clone library
  • chromosome libraries
  • cDNA clone libraries
  1. Shotgun sequencing
  2. Hierarchical shotgun sequencing for large genomes