Lecture 7 gene prediction Flashcards
Once a genome seq info has been successfully sequenced and assembled, what type of approach is used to predict its gene structure?
computational approaches
What are computational gene predictions and what do they include?
Computational gene prediction is necessary to obtain comprehensive functional information on genes and genomes. The process includes detection of the location of open reading frames (ORFs).
What else does the computational gene prediction require in eukaryotes?
Description of the structures of introns/exons.
What is the main goal of gene prediction?
The main goal is to describe all the genes computationally with 100% accuracy.
What aren’t conserved in coding regions and how does it effect gene prediction?
Motifs aren’t conserved in coding regions making gene prediction one of the most difficult problems in the field of pattern recognition.
What are the 3 way for finding genes in genomes?
1) Similarity-based or Comparative
2) Ab initio = “from the beginning”
3) Combined “evidence-based” (BEST)
1) Similarity-based or Comparative
- BLAST - Do other organisms have similar sequence?
(Is sequence similar to known gene or protein)
2) Ab initio
- Ab initio meaning, “from the beginning” predicts without explicit comparison with cDNA or proteins via
“rule-based” gene models - but rules are derived from statistical analysis of datasets
3) Combined “evidence-based”
- Combine gene models with alignment to known ESTs & protein sequences
What is the gene density in prokaryotes
High, more than 90% of their genome contains coding seq w very few repetitive sequences
What is the gene prediction in prokaryotes?
Each prokaryotic gene is composed of a single contiguous stretch of ORF coding for a single protein or RNA with no interruptions within a gene.
- bac genes have a start codon ATG. GTG and TTG are used as alternative start codons at times.
- At the end of the protein coding region is a stop codon, TAA, TAG, TGA
As there may be multiple ATG, GTG, or TGT codons in a frame in prokaryotes, how can the start codon be located?
-Identification of the ribosome binding site (Shine-Delgarno sequence) can help locate the start codon. The ribosomal binding site is located slightly upstream of the translation start codon and has a consensus motif of AGGAGGT.
- Identification o the stop Condon is straight forward
What is the ribosomal binding site/ Shine-Delgarno sequence?
a stretch of purine-rich sequence complementary to 16S rRNA in the ribosome.
How can potential coding regions be detected?
by looking for ORFs
What kind of ORF should be used and how can a purposed frame be confirmed for the presence of the gene?
- Long open reading frames may be a gene
- A basic approach is to scan for ORFs whose length exceeds certain threshold (60 amino acids/180 nucleotides)
– A proposed frame can be confirmed by the presence of other signals such as the Shine–Delgarno sequence.
When should a stop codon be seen at random?
one stop codon every (64/3) = 21 codons
what is a disadvantage of using stop codons in an ORF to detect a gene
genes are usually longer than 21 codons therefore if stop codons are used a whole gene may not be identified.