ab initio gene prediction Flashcards

Question 1

Q

bacteria vs eukaryotes

Answer

A

much easier for bacteria
no introns (single coding region)
smaller intergenic regions
- genes easier to find
- 2-3% eukaryote genome is genes
look for largest ORFs
- accurate for low GC
- high GC → fewer stop codons
  - many ORFs will be by chance

Question 2

Q

gene finding programs

Answer

A

artemis - widely used
prodigal
- doesn’t just look for ORFs
- log-likelihood information
- accuracy >90%
- performs well with high GC

Question 3

Q

prodigal

Answer

A

create training set for protein-coding regions
- look for G/C bias at each position of ORFs
- build model of predicted ORFs with positional bias
dicodon bias also used
penalise ORFs downstream of another larger ORF
- difference between 2 scores removed from smaller ORF score
- add length factor to each
  - higher in genome with lwoer GC
iteration
dynamic programming

Question 4

Q

log-likelihood

Answer

A

Question 5

Q

prodigal iteration

Answer

A

Question 6

Q

prodigal dynamic programming

Answer

A

performed over all start-stop pairs
score each gene on start and dicodon scores
allow some overlap
- opposite strands particularly
- smaller overlap on same strand
determine final gene prediction

Question 7

Q

eukaryotic gene prediction

Answer

A

(7 cards)