Bioinformatics 2 Flashcards

1
Q

What algorithm does BLAST use?

A

Smith-Waterman local alignment algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the Smith-Waterman algorithm do?

A

Matches each individual amino acid between two sequences and then extends the alignment in both directions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does BLAST do to the S-W algorithm to improve its speed?

A

Uses a minimum word size to identify ‘Hot Spots’ in the query and databases sequences and then extends the alignment from these.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three stages of S-W algorithm?

A

Seeding, extension and evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain seeding.

A

The query sequence is broken up into all possible overlapping 3 letter words and compared to all possible 3 letter words.
Words in the database which produce a match score above a certain threshold are called neighbourhood words - these are recorded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain extension.

A

Matched seeds are extended in both directions to form alignments. Alignments are stopped when the score goes below a certain drop-off threshold. This identifies high scoring pairs (HSP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain evaluation.

A

Identifying whether alignments produced are statistically significant - these ones are reported as HSPs, with an E-value score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do PAM and BLOSUM not take into account?

A

Amino acid substitutions are not uniform as there is varying evolutionary pressure both at different positions within a sequence and between sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are PSSM (position specific scoring matrices)?

A

These build MSAs and score amino acids based on their position within the sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do high and low PSSM scores indicate?

A

High scores are given to amino acids at a specific positions which are conserved between several different sequences and low scores given to amino acids that are rarely appear at that specific position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain how PSI-Blast works.

A

1st iteration works the same as BLAST and uses BLOSUM62.
A MSA is generated from the highest scoring pairs in this iteration and PSSM is generated from this.
The 2nd iteration will search the database and detect sequences that match the conserved patterns specified by the PSSM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you create a PSSM

A

Create a MSA and calculate the frequency of amino acids at each position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a pseudocount in a PSSM?

A

Some of the observed frequencies are equal to 0, this is due to the limited number of sequences in the MSA and may not reflect reality. Addition of a small number to observed frequencies = pseudocount.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the score equal to in a PSSM

A

Score = (AA frequency + pseudocount)/(Number of sequences + 20(pseudocount)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the simplest pseudocount?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the final stage of creating a PSSM?

A

Take the score and convert into log form. Then divide the score by the expected frequency (simplest would be 1:20 = 0.05).

17
Q

BLAST cannot identify domains , what can?

A

Domain databases such as PROSITE - which uses a PSSM.

18
Q

What is the simplest method of capturing sequence variation?

A

Producing a pattern.

19
Q

What does this symbol ‘

A

The pattern must be located at the N terminus.

20
Q

What program can be used for patterns?

A

PROSITE.