prediction of protein function Flashcards
1
Q
sequence similarity
A
- most common method of function prediction
- high identity indicates similar function
- but proteins with similar sequences can have different function
- one position changing can change funciton e.g. binding
- experimental confirmation
2
Q
MSAs
A
- show where to look for evolutionary constraints
- highly conserved residues important
- e.g. pml
3
Q
pml
A
- human TF with RING domains
- zinc finger protein
- Zn coordinating residues for binding
- C..H..C..C
- found in MSA
- C..H..C..C
4
Q
regular expressions
A
- characteristic sequence patterns in PROSITE
- more plasticity than a consensus sequence
5
Q
sequence logos
A
- more detailed representation than regular expression
- total column height (bits) = degree of conservation
- height of each symbol = relative frequency of that residue in that position
6
Q
profile HMM
A
- needed to find more remote homologs
- machine learning approach to funciton prediction
- profile = based on MSA
- hidden = states not directly observed
- markov = state depends only on state before
7
Q
Pfam HMMs
A
- MSA of full domain
- characterise columns on conservation
- absolute/high conservation (capital)
- some/no conservation (lower case)
- insert (generates new column)
- conserved columns make up a consensus
8
Q
HMM states
A
- insert
- generate a residue for an inserted column
- delete
- remove a normally conserved column
- match
- generate a residue for a conserved column
- according to residue frequencies in MSA
- HMM machine for generating sequence belonging to that protein family
- probabilistic model
- each path through model has associated probability
9
Q
forward algorithm
A
- dynamic programming in pfam
- finds probability that model could generate a given sequence
- align HMM with sequence
- get likelihood for the sequence-profile alignment
- bit score and E value
- identities and similar residues
- posterior probability
10
Q
targetP
A
- profile HMMs to identify signal peptides characteristic of specific cellular compartments
- mitochondrion, chloroplast or secretory system
11
Q
TMHMM
A
- HMM to detect TM helices
- predict architecture of integral membrane proteins
12
Q
STRING
A
- uses multiple sources of information to predict functional interactions
- genomic context
- high throughput experiments
- coexpression
- previous knowledge
- some aspects of function not directly related to sequence
13
Q
sources used by STRING
A
- genomic context:
- conserved proximity of 2 genes in the genome
- e.g. trpA trpB
- can sometimes fuse into one protein coding gene
- indication of involvement in same pathway
- confirm with protein interaction experiments
- gene coexpression experiments
- conserved expression pattern indicates proximity
- text mining
- search abstracts for co-mentions