VL 7 Flashcards
What is an Example Pattern?
- restriction enzyme sites (DNA)
- transcription factor binding sites (DNA)
e.g :
Estrogen response element
5’ AGGTCA NNN TGACCT
An example pattern refers to a recurring sequence feature observed in biological sequences, such as a specific nucleotide or amino acid sequence that appears repeatedly. These patterns provide insights into the function, structure, and evolution of biomolecules.
–> The Prosit Databank stores pattern for proteins and compare them
What is ProRules?
Patterns + Profiles = ProRules
ProRule is a database that provides curated information and rules for the functional annotation of protein domains, sites, and patterns. It helps researchers interpret the functional significance of protein features and guide further investigations.
- Patterns: exact description of a profile, even one mismatch is not accepted, easy to formulate and to use
- Profiles: or weight matrices providing numerical weights for each match / mismatch, enhanced sensitivity but more complex
- Prorule: combines both approaches, manually curated
PROSITE SYNTAX (SPICKER)
PROSITE Syntax
* “A”… standard IUPAC one-letter codes
* “x”position where any aa is accepted
* “[ALT]”ambiguity any of Ala or Leu or Thr
* “{ALT}”negative ambiguity any aa but not! Ala, Leu or Thr
* “x(2,4)-L(3)”repetition, so two to four amino acids of any type and then
exactly three Leu
* “L(2,4)”however is not allowed :(
* “-”aa separator
* “<M”N terminal Met
* “A>”C terminal Ala
* non-ambiguities as normal text: “M-A-S-K-E” ñ “MASKE”
What is regex?
Regex (Regular Expression) is a powerful and flexible sequence-matching pattern used in various programming languages (e.g R grep()) and tools. It is a sequence of characters that defines a search pattern, allowing you to match and manipulate text based on specific rules and patterns. You can perform complex string searches, pattern matching, and text manipulation operations. They allow you to find matches, extract substrings, replace text, validate input, and more.
Regex Syntax (SPICKER)
- “A”… standard IUPAC one-letter codes
- “.” position where any aa is accepted
- “[ALT]” ambiguity any of Ala or Leu or Thr
- “[ˆALT]” negative ambiguity any aa but not! Ala or Leu or
Thr. - “.{2,4}”, “L{3}” repetition. two to four amino acids of any
type, exactly three Leu - “L{2,4}” two to four Leu, why not :)
- “+” one or more of previous, i.e. “{1,}” * “*” zero or more if previous, i.e. “{0,}” * “ˆM” N terminal Met
- “A$” C terminal Ala
What’s the difference between Prosit and Regex?
In summary, Prosit is a tool used in proteomics for spectral library generation, while Regex is a pattern-matching technique used in programming and text processing. They serve different purposes and are applied in different domains.
- Prosite: description for protein sequences
- Regex: can be used for almost everything
What are the disadvantages of regex?
- too rigid to pick up divergent sequence. If you have 1 mismatch you have a fail
- short patterns might be too unspecific / will find false
- short patterns might be too unspecific / will find false positives
- cannot include information about relative frequencies
- cannot include information about relative frequencies