PART II: GENES & GENOMES: SEQUENCE VARIATION AND ANALYSIS Flashcards
What are examples of regulatory sequences like promoters and RBS respectively?
- TTGACA-17bp-TATAAT
- GGAGG near the start codon of a gene encoding a protein
What two things does gene annotation encompass?
- Predicting and MARKING the position of genes and OTHER elements on a genome sequence
- Predicting protein function
What is involved in 1. Predicting and MARKING the position of genes and OTHER elements on a genome sequence (in particular with RNA features and protein coding genes) ?
- RNA features: Predict function and location SIMULTANEOUSLY
- Protein coding genes: Predict the location of genes on genome (Gene Finders)
- Translate the encoded protein and predict function
What is involved in 2. Predicting protein function?
- Similarity to the characterised proteins
- “Hypothetical Proteins” (not similar to any characterized protein)
What are features of protein coding genes in prokaryotes?
- Contained in an ORF (Sequence b/w INITIATION and STOP codons, >50bp are ideal)
- Initiation codon, Ribosome binding site, Minimum length
Can prokaryotes contain introns?
-Yes but they are rare and can be polyscistronic (one gene: many proteins)
What is usually the initiation codon in prokaryotes?
- ATG (90%) –> Met
- GTG (8%) –> Val
- TTG (1%) –> Leu
Why aren’t ALL ORFS marked as genes?
- Because if there are multiple potential ORFs with lots of stop codons (top) BUT there is a clear gene ‘overlap’ where a region ACTUALLY CODES for a gene, that will be the coding one
- Gene overlap is VERY RARE so generally only the largest one will be coding ORF marked as a gene
Is gene overlap for coding genes common?
- NO it is RARE
What are Gene Finders?
- Programs such as GeneMarkS, GLIMMER, Prodigal (find all ORFs in gene)
What do Gene Finders do?
- Identify a potential START codon (ATG but also GTG and TTG)
- They check for CONTEXT with the RIBOSOME BINDING SITE
How can we predict protein function? (2 tools)
- Databases–> For protein sequences and function (GenBank)
- Using sequence SIMILARITY to predict function–> Proteins with almost same sequence are likely to have the same/similar function
What is BLASTp used for?
- Protein query versus protein database
What is BLASTx used for?
- Nucleotide query versus protein database (translated into 6 peptide sequences for each of the 6 reading frames)
What does the ‘Expect’ section in BLAST mean?
-That the likelihood of the match happening by chance is very close to zero (0.1 cut off generally)