Sequence alignment Flashcards

Question 1

Q

Search for sequence similarity

Answer

A

one of the most common tasks
why? some amino acids may be replaced by other amino acids
- similar may equal to similar function
gaps may occur
- insertions and deletions [-]

Question 2

Q

Algorithms for sequence similarity

Answer

A

able to find the best alignments
deal with gaps and mismatches
- not all mismatches are the same, some are more important and should hold more weight
- matches should also have a different weight on the alignment [: identical, . similar]

Question 3

Q

Evaluating the best alignment

Answer

A

scoring/substitution matrix is needed, score for every pair of aminoacids
objective function, computational evaluation of the alignment
- ∑(i=1 ->L) s(a_i, b_i ) − ∑(j=1 ->G) γ+δ(len(j)−1))
- s score between aminoacids
- L lenght of alignment
- G
- γ fixed penalty
- δ
algorithm maximizes the value of the obj function

Question 4

Q

Types of scoring matrices?

Answer

A

PAM matrix
- Margaret Dayhoff, 1978
- calculate the probability in a small evolutionary step (1% of amino acid substitutions) that any amino acid would change or stay the same [PAM1]
- extended to longer evolutionary periods (240 steps, PAM240)
Blosum matrix
- used BLOCK database, alignments of most conserved regions of protein families
- only sequences with higher percentage of identity used
- blocks with 60% identity [Blosum60]
for both used the log of the ratio divided by the probability of a casual alignment
- indicates probability for the pairing not to be casual but because of homology
- s(a,b)= int(10*log(M(a,b)/C(a,b)))
- M probability matrix
- C casual probability matrix

Question 5

Q

Dot matrix

Answer

A

easy way to visualize alignments
- black dots indicate matches
- white no matches

Question 6

Q

Sequence alignment by dynamic programming

Answer

A

automatically calculate the path within the dot plot, giving the top score (best alignment)
alignments cannot go back, search only three positions (bottom, right and bottom-right diagonal)
going from top left to right ensures that we always know the score of the alignment for all possible options. We choose the highest

Question 7

Q

Smith & Waterman algorithm

Answer

A

based on dynamic programming
returns best alignment
O(n*m), n and m lenghts of sequences
can be used both on amino acid and nucleic acid sequences
too slow to search in a large database
constructs a scoring matrix used to trace-back to possible best alignment
- any negative region set to 0

Question 8

Q

Heuristics approaches for pairwise sequence alignment

Answer

A

faster than dynamic programming
- return excellent alignments, no guarantee of optimum
- based of search for words (k-tuple)
- useful in large-scale searches

Question 9

Q

Heuristics general strategy

Answer

A

Two main steps
- lookup table, k-mers indexed and positions saved
- read second sequence and for every k-mer check were it occurs in table, try to extend alignment
kmers can be converted to indegers
- a^k possible words
- A=00, C=01, G=10, T=11
- sequence of coded basis is converted to integer
- sequence of integers is converted to another integer

Question 10

Q

FASTA

Answer

A

one sequence by axis at top left
- growing diagonals
- identify regions of identity
- scan regions with scoring matrix and save best initial regions
- join regions with scores > threshold
- recalculate and optimize alignments around highest initial scoring region
lenght is ktup, number of amino acids considered
- for nucleic acids is generally 6

Question 11

Q

BLAST

Answer

A

most popular
produces an index of similar words < threshold, not only identical
tries to extend locally each “seed”, to obtain a highscoring segment pair

Question 12

Q

BLAT

Answer

A

similar to Blast but keeps index of entire genome in memory
non-overlapping 11-mers
designed to quickly find sequences of 95% and greater similarity of length 40 bases or more
may miss more divergent or shorter sequence alignments

Sequence alignment Flashcards

(12 cards)