Annotation Flashcards

1
Q

what does annotation mean in genomics?

A

to make sense of the assembly, characterizing functional elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 ideas about what a gene is

what is their main difference?

A

johannsen 1905 - the word gene is completely free from any hypothesis, many characteristics of the organism are conditioned by special, separable, and therefore self-existent fundamentals that occur in the gametes. (gene is defined by effect on characteristics of an org).

gene model - region of the genome to be transcribed into RNA and then protein. (no link to characteristics of an org)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is an ORF

A

region of DNA which starts with ATG and ends in stop codon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

difference btw PK and EK genes

A

PK - 1 gene is a contiguous region of DNA (no introns), intergene spaces are small. genome of smaller, fewer genes.
EK - exons separated by introns (removed from mRNA before translation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is key about alternative splicing?

A

it can increase genome complexity without increasing genome size.

in humans, 75% genes have an alternative isoform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 approaches to identify gene models

A
  1. A prior/ab initio: look for sequence patterns. Protein-coding regions have distinctive patterns of codon statistics.
  2. Evidence based: Recognises regions corresponding to
    previously identified gene models. uses similarity of translated AA seq to known proteins in other species.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which is better approach to identify gene models?

A

A priori less biased but as more genomes are annotated, evidence based annotation becomes more reliable. however error propagation is a big issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what features of a gene does A priori look for?

A
ATG - start
TATA box - promoter region (30bp upstream of ATG. binding site of RNA pol)
Stop codon
Splice sites on codons
Poly adenylation sequence

these features are versatile, which presents a difficulty for A priori.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

example of 2 protein domains

A

kinase domains - signal transduction

LRR leucine rich repeats - involved in protein protein interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is domain shuffling

A

gene segments coding for functional domains are shuffled between different genes in evolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what causes complexity when using evidence based method of gene model prediction?

A

Codon usage preference

Different orgs may prefer to have different codons for he same AA over other codons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is Pfam

A

a database which makes predictions about domains of proteins based an AA seq.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How much of human genome codes for proteins?

A

1.3%

23000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is different about sub telomeric regions?

A

low in protein encoding genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what causes huge variation in gene length amongst protein encoding genes?

A

variation in intron size

exon size stays fairly constant, around 200bp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

example of a very long gene

A

dystrophin gene - 2400kb

17
Q

how can distribution of genes across a chromosome vary?

A

Less in the centre and at very end (subtelomeric region)
in centre, more repetitive (LTR).
Recombination rate is higher near ends of chromosome.

18
Q

how many RNA coding genes in human genome? (not mRNA)

A

3000
code tRNA and siRNAs eg
usually involved in control of gene expression

19
Q

what is a pseudogene?

A

degenerated genes that have mutated so far from their original sequences that the polypeptide sequence they encode will no longer be functional

20
Q

what is a processed pseudogene?

How do they act?

A

picked up by viruses from mRNA, and reverse transcribed.
Introns and promoters have been lost.not expressed as the original protein but sometimes transcribed and can regulate gene function by competing with miRNA for binding to messenger

21
Q

what is a non processed pseudogene?

A

derived from gene duplication and located on the same chr as the parental gene

22
Q

what is a unitary pseudogene?

A

derived from mutations of the parental genes, which are lost.

23
Q

What is an enhancer?

A

short 50-1500bp region, binds to proteins to increase transcription.
can be up to 1Mb away from associated gene

24
Q

How much of out genome is repetitive elements?

A

80-90%

25
Q

4 types of repetitive elements in human genome

A

LINE long interspersed elements. 21% of genome
SINE short.. 13% genome
Micro satellites
mini satellites - together 15%. 10,000-100,000 copies

26
Q

who discovered transposable elements?

A

Barbara McClintock 1940s

Studied maize pigments

27
Q

How do transposable elements work in corn kernels to give color variation?

A

dark color - anthocyanin
yellow - transposable element disrupts anthocyanin gene expression.
Transposition occurs rapidly in kernel development.
Chr 9 more prone to breakage in some cultivars. breaks caused by transposable element, named Dissociation element (Ds). Ds depends on presence of Activator element (Ac), unlinked.
Heterozygous for colorless (c), lord of Chr 9 part = yellow.

28
Q

when does a mottled or striped kernel phenotype occur?

A

If Ds inserts directly into C gene. - yellow
if Ac is present, Ds can jump out of c gene and reactivate C. - purple.
Causes spots as all cells derived from this cell are purple.
Size of spot indicates at what stage in kernel development transposition occurred.

29
Q

where else in nature can we see effect of transposable elements.

A

Rose

grape colour

30
Q

Names of 2 types of TEs

A

Class 1 - Retrotransposons.

Class 2 - DNA transposons

31
Q

Describe Retrotransposons?

A

usually degenerated retroviruses.
require an RNA intermediate to replicate.
in mammals - LINE SINE
LINE - encodes reverse transcriptase and can replicate autonomously.
SINE too short to encode own reverse transcriptase, depend on LINES for replication.
Often have LTRs at the end.

32
Q

Describe DNA transposons

A

Do not require RNA intermediate.
Encode transposes enzyme., recognizes seq in transposon itself, cuts it out and inserts somewhere else. Often leaves mutation at original site.
Contain terminal inverted repeats at the ends, targets of excision machinery.

33
Q

How are inserted transposable elements flanked by repeats?

A

Transposase cuts DNA leaving sticky ends.

TE inserts, and sticky ends filled in.

34
Q

Biological functions of transposable elements

A
  1. Altering properties of genes - insertion can cause knock out phenotype. insertion into promoter region can alter expression patterns/splicing of near by genes.
  2. Gene evolution by gene fusion or exon shuffling. Eg, rapidly evolving virulence effectors in fungal plant pathogens are preferably located in repeat-rich regions of the genome.
35
Q

How do TEs control grape colour?

A

insertion of Gret1 LTR into allele Vvmyb1A in black grapes caused loss of function = white grapes.
Rearrangement of Gret1 = red coloured.

36
Q

How have TE driven evolution of peppered moths?

A

replacement of pale form by black form, driven by predation and pollution.
black due to insertion of large tandemly repeated TE into first intron of cortex gene. increased expression of cortex.

37
Q

what do TEs partially explain?

A

C value paradox.

differences in repeat content

38
Q

What is ENCODE

A

Encyclopedia of DNA elements
international collaboration or research groups funded by the National Human Genome Research Institute (NHGRI).
trying to compile a list of all functional elements in human genome, and regulatory elements.
2012 published biochmical functions for 80% of genome, but is highly debated.

39
Q

What is BUSCO

A

Benchmarking Universal Single-Copy Orthologs

defined a set of universal single-copy genes that occur in all organisms, used to assess completedness of assembly and annotation.