Lecture 24 Flashcards
how can we predict the presence of an exon/gene using ORFs?
find the predicted translation in all 6 reading frames –> if ORF is above a specified minimum length (i.e. not interrupted by stop codon) it may be an exon
what is an ORF?
stretches of codons that are not stop codons
once we have a predicted ORF, 3 ways to look for evidence it is part of an exon
- conservation across species
- if cDNAs contain the sequence
- codon bias
how can we use conservation as evidence a predicted ORF is part of an exon?
if there are stretches of conservation across species that match up to an exon, it must be an exon
what is the gold standard method for finding an exon?
looking for cDNAs that map to the gene
describe the use of cDNAs to find an exon
if cDNA has same sequence as predicted exon –> it was once mRNA –> it was transcribed –> it’s a coding gene
what is an EST?
“Expressed Sequences Tag”
cDNA that only sequences 5’ and 3’ ends
describe the use of codon bias to find a predicted exon
most aa can be coded for by >1 codon (redundancy) but these codons are not equally utilized in diff organism –> this can be a signature of an ORF that’s a gene
ex. 1% of arginine in E. coli is made by AGA –> if you see AGA, unlikely to be coding bc so rare in E. coli
what % of genome is exons of protein-encoding genes?
3%
what % of genome is exons + introns + regulatory sequences?
28%
what % of genome encodes protein sequences?
1%
what % of genome is repetitive sequence?
45%
why is the cDNA method not always helpful?
3% of genome is exons of protein-encoding genes but only 1% of genome encodes protein sequences
therefore, not all genes encode
if 3% is exons, why does only 1% encode protein?
start codon is not always at start of exon and stop codon is not always at end of exon
3 ways to conduct comparative genomics
- within an organism
- btwn individuals in a species
- btwn organisms
purpose of doing comparative genomics within an organism
to identify gene families and gene duplications
purpose of doing comparative genomics btwn individuals in a species
to identify differences associated with phenotype or disease
what is a paralog?
related genes within an organism that encode proteins with similar aa sequence
function of genes that are paralogs
may be functionally redundant or have independent function
what is a possible origin of paralogs?
Gene duplication during evolution
why do we sequence the exome?
compare the exomes of ppl with disease and without disease to identify disease-associated gene
benefits of sequencing the exome compared to whole-genome sequencing
effective and cost-effective
similarities btwn human and mouse genome?
99% of mouse genes have human paralog and vice verse
genome organization (order of genes and non-coding regions) is the same
what is synteny?
conserved order of genes btwn 2 species
why do we use phylogenetic inference?
to see how genes evolved over time
what is an ortholog?
homologous genes at same genetic locus in diff species
describe the evolution of the Vitellogenin gene in platypus
platypus has 1 vitellogenin gene but other mammals don’t have at all
chickens have 3 vitellogenin genes and share common ancestor with platypus so ancestor had 3 vitellogenin genes, then were lost in mammals
what is uniparental disomy?
correct number of chromosomes but both came from same parent
what causes uniparental disomy?
non-disjunction
why does non-disjunction in UPD not cause trisomy?
there is trisomy rescue, so randomly an extra chromosome will get lost