Bioinformatics 2 Flashcards
What algorithm does BLAST use?
Smith-Waterman local alignment algorithm.
What does the Smith-Waterman algorithm do?
Matches each individual amino acid between two sequences and then extends the alignment in both directions.
What does BLAST do to the S-W algorithm to improve its speed?
Uses a minimum word size to identify ‘Hot Spots’ in the query and databases sequences and then extends the alignment from these.
What are the three stages of S-W algorithm?
Seeding, extension and evaluation.
Explain seeding.
The query sequence is broken up into all possible overlapping 3 letter words and compared to all possible 3 letter words.
Words in the database which produce a match score above a certain threshold are called neighbourhood words - these are recorded.
Explain extension.
Matched seeds are extended in both directions to form alignments. Alignments are stopped when the score goes below a certain drop-off threshold. This identifies high scoring pairs (HSP)
Explain evaluation.
Identifying whether alignments produced are statistically significant - these ones are reported as HSPs, with an E-value score.
What do PAM and BLOSUM not take into account?
Amino acid substitutions are not uniform as there is varying evolutionary pressure both at different positions within a sequence and between sequences.
What are PSSM (position specific scoring matrices)?
These build MSAs and score amino acids based on their position within the sequence.
What do high and low PSSM scores indicate?
High scores are given to amino acids at a specific positions which are conserved between several different sequences and low scores given to amino acids that are rarely appear at that specific position.
Explain how PSI-Blast works.
1st iteration works the same as BLAST and uses BLOSUM62.
A MSA is generated from the highest scoring pairs in this iteration and PSSM is generated from this.
The 2nd iteration will search the database and detect sequences that match the conserved patterns specified by the PSSM.
How would you create a PSSM
Create a MSA and calculate the frequency of amino acids at each position.
What is a pseudocount in a PSSM?
Some of the observed frequencies are equal to 0, this is due to the limited number of sequences in the MSA and may not reflect reality. Addition of a small number to observed frequencies = pseudocount.
What is the score equal to in a PSSM
Score = (AA frequency + pseudocount)/(Number of sequences + 20(pseudocount)
What is the simplest pseudocount?
1