Patterns in Protein Sequences Flashcards

1
Q

Prosite aims

A

Database by SIB (Swiss Institute of Bioinformatics)
Database of protein domains, families and functional sites.
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them.
PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Prosite Syntax

A
  • “A”… standard IUPAC one-letter codes
  • “x”position where any aa is accepted
  • “[ALT]”ambiguity any of Ala or Leu or Thr
  • “ALT” (geschweifte Klammer) negative ambiguity any aa but not! Ala or Leu or Thr.
  • “x(2,4)”,”L(3)”repetition. two to four amino acids of any type, exactly three Leu
  • ”-“ aa separator
  • "”C terminal Ala
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

regex tool for the terminal, syntax

A

Regexpression Tools
• a powerful language for specifying patters
• used in different contexts and in different “dialects”
- Linux shells
- text editors
- protein motif data bases (e.g. PROSITE)
- R-grep(’ALA’,psa.7rsa$AA)
• pattern
- in everyday life
- in biological sequences: Proteins, DNA

Regex Syntax
• characters:- exact matches: x q ATG SS2010
• metacharacters:
- characters with special meaning:
[ ] . ? * + j f g () n^ $
• metasymbols:
- sequences of characters with special meaning:
s/t/n/w
• “A”… standard IUPAC one-letter codes
• “.” position where any aa is accepted
• “[ALT]”ambiguity any of Ala or Leu or Thr
• “[^ALT]” negative ambiguity any aa but not! Ala or Leu or Thr.
• “2,4”, “L(3)” (geschweifte Klammern) repetition. two to four amino acids of any
type, exactly three Leu
• “+” one or more
• “*” zero or more
• “^M”N terminal Met
• “A$”C terminal Ala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

grep real regex syntax - reading

A

Commandline Tool grep
• can be used to search text for patterns
• used to extract lines from files by searching for file content
• is one of the most useful and powerful linux commands

Syntax:
grep [options] pattern file
Example:
[selbig@white LehreSS16]$ grep M ss_aa_matrix.txt
M 48532 331720 58130 227032 137 ..
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regex Disadvantages

A
  • too rigid to pick up divergent sequences
  • short patterns might be too unspecific / will find false positives
  • cannot include information about relative frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly