Lecture 4 Flashcards
What is the aim of blast?
To find in a database of sequences, sequences or subsequences most similar to query.
What are the blast components?
Algorithm to FIND similar sequences
Probabilistic model to RANK them
(2 sequences needed and their similarity is quantified, then substitution matrix gives alignment score)
What is a profile?
A table listing the frequencies of AAs in each position of a protein sequence.
They are calculated from an MSA with a domain of interest.
Allow consensus sequence to be identified.
What is the use of profiles?
Highly conserved sets of residues likely to be part of active site –> clues to function.
Little conservation likely to be in surface loops.
How can we use profile patterns to identify homologues?
Match query sequence against sequences in alignment table
Give higher weight to positions that are conserved.
Absolutely conserved regions should be insisted upon being found.
What is a good e value?
Lower than 0.0001
Give an example of a multi domain protein.
GARs-AIRs-GARt in vertebrates
(In bacteria each enzyme domain is encoded separately)
Why is domain identification needed?
For high resolution structure
Sequence analysis
Multiple alignments
Fold recognition.
What is the derivation of substitution matrices?
Count number of S –> T changes in homologous ALIGNED sequences
Use relative frequencies of changes to form
Scoring matrix for substitutions.
Likely changes score higher than 1.
What must you ensure when making matrices?
To restrict sequences to be sufficiently similar so that any position is not changed more than once!
What is required for domain fusion?
Both domain scold correctly and bury fraction of solvent exposed surface area in inter domain surface.
What is an HMM?
Machine learning algorithm used for profile searches of sequence databases.