Gene Finding Flashcards
Genome - Genes
~20,000 genes: not all genome is actually genes !
pseudogenes: DNA sequence that looks like a gene. Can be leftover of ancestors
Regions with lots of repeats: junk/structural element/regulatory elements
non coding functional RNA: microRNA stop gene to turn into proteins, ribosomal RNA help genes turn into proteins and transfer RNA link gene to protein
Telomere: repetitive dna at the end of chromosomes that shorten each time the cell splits -> aging
Genes to amino acids
Codons: 3 bases codon => amino acid 64 codons => 20 amino acid: redundancy 3rd base usually unimportant: Wobble Start stop codons
coding is universal with some exceptions: eg Mitochondria: other amino acid or other stop codon
Open reading frames
ORF
start with a start codon
end with a stop codon
contain full condons (check it is a multiple of 3 to be valid!)
Can start reading from 3 different positions
mutations: deletion/insertion greater impact because it shifts everything compared to a mutation that modify a nucleotide (+ changing just a base can change nothing: wobble or redundancy)
Open reading frames are candidate genes
Statistical significance candidate genes
Check how likely the ORF would occur by chance
likeliness depending on length
3 stop codons
how likely it is there are k codons before a stop codon
(61/64)^k
to have p<0.05 our gene thus need 62 or more codons
if want 99% chance it wouldnt occur by chance 100 codons
with all codons being equally likely
OR Randomisation test, count frequency codons and randomly output then compare. Or permute and compare. Count number of ORFs. Compare if there are things more likely in the ORFs etc