Sequence Alignment and Searching Flashcards

Question 1

Q

What are homologues?

Answer

A

Evolutionarily-related proteins
Two types: orthologs and paralogs

Question 2

Q

How do protein sequences evolve?

Answer

A

Substitutions due to single-base mutations
Insertions or deletions of residues - usually in the connecting loops (not the secondary structures)
Indels make it harder to compare sequences (need to line up the equivalent regions and put gaps where there are indels)

Question 3

Q

The formula for % sequence identity

Answer

A

(no. of identical residues/no. of residues in smallest protein) x100

Question 4

Q

How do you search sequence databases?

Answer

A

Do fast scans using approximate methods (BLAST or PSI-BLAST)
Align proteins carefully using a dynamic programming method
Scan against sequence profiles/HMMs in secondary databases
Align query sequences against family relatives

Question 5

Q

Tuple size

Answer

A

Runs of identical residues (at least 3 in a row)

Question 6

Q

Window (path matrix)

Answer

A

The two red bars on either side of the matrix. The window is a certain distance not too far from the centre diagonal.

Question 7

Q

Score (path matrix)

Answer

A

The score of the path (watch yt video)

Question 8

Q

Types of residue substitution matrices

Answer

A

Identity matrix
Physicochemical properties matrix
Evolutionary matrix

Question 9

Q

Physicochemical properties matrix

Answer

A

Score residue pairs according to similarities in their physicochemical properties

Question 10

Q

Identity matrix

Answer

A

Simplest scoring scheme - amino acids are either identical (1) or non-identical (0)

Question 11

Q

Evolutionary matrices

Answer

A

Score residue pairs according to how frequently the mutation is observed to occur in evolution

Question 12

Q

Dayhoff matrix

Answer

A

Based on evolutionary relationships, it is based on analysing the substitutions observed in closely related sequences (>80% identity)
The method measures evolutionary distance by determining the number of point-accepted mutations

Question 13

Q

BLOSUM substitution matrices

Answer

A

The matrix is derived from analysing substitution patterns in more distant relatives (<85% sequence identity)
For clusters of related sequences derive multiple alignments without gaps
For short regions of related sequences use the alignments to calculate residue substitution frequencies

Question 14

Q

How do we know which matrix to use?

Answer

A

Matrices derived from observed substitution data (e.g. BLOSUM) are better than identity matrices or those based on physical properties
In database searching it may be best to use PAM120 or BLOSUM62
Various studies suggest that PAM250 gives the best result when aligning distant proteins using dynamic programming algorithms

Question 15

Q

Needleman & Wunsch Algorithm steps in dynamic programming

Answer

A

Score the path matrix
Accumulate scores in the path matrix
Trace the highest-scoring path in the path matrix

Question 16

Q

How do we accumulate scores in the path matrix?

Answer

Study These Flashcards

A

Start at the bottom right
Move right to left accumulating scores
Move up the next row

Question 17

Q

How does BLAST work?

Answer

Study These Flashcards

A

A highest-scoring segment pair is found between two sequences
The sequences may be related if HSP score >cutoff
1. Match significant words
2. Compare the word list to the database and identify exact matches
3. For each word match, extend the alignment using a PAM matrix and dynamic programming
BLAST searches for 2 non-overlapping segments on the same diagonal. They must be within a certain distance of each other before the extension is invoked. It can also allow gaps so that the method joins segments on different diagonals.

Question 18

Q

How do we assess the significance of a sequence match?

Answer

Study These Flashcards

A

Length - we can get artificially high scores between small sequences
Composition - if sequences are rich in particular amino acid residues we can get high scores for unrelated proteins
To assess the significance of a match it is necessary to compare the score with that returned by random or unrelated sequences
If the database is small or when considering a pair-wise comparison, the sequences can be shuffled to generate random sequences

Question 19

Q

S (BLAST)

Answer

Study These Flashcards

A

Score for the pairwise alignment

Question 20

Q

E-value (BLAST)

Answer

Study These Flashcards

A

Number of expected hits by chance with score S or higher given the size of the database and the length of the alignment

Question 21

Q

How do you conduct a Multiple Sequence Alignment

Answer

Study These Flashcards

A

Align the most closely related pairs using DP and gradually align these groups together keeping the gaps that appear in earlier alignments fixed
(or) Add sequences one at a time to a growing multiple alignment

Sequence Alignment and Searching Flashcards

Week 3 Lecture 1 (21 cards)