Lecture 24 Flashcards
how can we predict the presence of an exon/gene using ORFs?
find the predicted translation in all 6 reading frames –> if ORF is above a specified minimum length (i.e. not interrupted by stop codon) it may be an exon
what is an ORF?
stretches of codons that are not stop codons
once we have a predicted ORF, 3 ways to look for evidence it is part of an exon
- conservation across species
- if cDNAs contain the sequence
- codon bias
how can we use conservation as evidence a predicted ORF is part of an exon?
if there are stretches of conservation across species that match up to an exon, it must be an exon
what is the gold standard method for finding an exon?
looking for cDNAs that map to the gene
describe the use of cDNAs to find an exon
if cDNA has same sequence as predicted exon –> it was once mRNA –> it was transcribed –> it’s a coding gene
what is an EST?
“Expressed Sequences Tag”
cDNA that only sequences 5’ and 3’ ends
describe the use of codon bias to find a predicted exon
most aa can be coded for by >1 codon (redundancy) but these codons are not equally utilized in diff organism –> this can be a signature of an ORF that’s a gene
ex. 1% of arginine in E. coli is made by AGA –> if you see AGA, unlikely to be coding bc so rare in E. coli
what % of genome is exons of protein-encoding genes?
3%
what % of genome is exons + introns + regulatory sequences?
28%
what % of genome encodes protein sequences?
1%
what % of genome is repetitive sequence?
45%
why is the cDNA method not always helpful?
3% of genome is exons of protein-encoding genes but only 1% of genome encodes protein sequences
therefore, not all genes encode
if 3% is exons, why does only 1% encode protein?
start codon is not always at start of exon and stop codon is not always at end of exon
3 ways to conduct comparative genomics
- within an organism
- btwn individuals in a species
- btwn organisms