Sequence Analysis Flashcards

Question 1

Q

Alignment Based Methods

Answer

A

Goal: find best alignment.
Measure/Score: As few as possible introduction of gaps and substitutions.
Question: How to achieve this?
Approach: Pairwise vs multiple sequence alignment.

Question 2

Q

Edit distance invented by Levenshtein (1965)

Answer

A

Jeweils eine Änderung (Hinzufügen/Löschen eines Buchstaben, verändern…) = Distance +1

Question 3

Q

Damerau

Answer

A

• flip operations are one change
• brid (old english) ñ bird (new english) ñ 1
operation
• mistyping as “ebya”is more easily recognized by search engines in the web
• used as well in biology, spell checking, …

Question 4

Q

Global Aligment “Needleman-Wunsch”

Answer

A

• gaps can get different scoring points than edits
• exchange matrix for different letter changes
• find global alignment –> Needleman-Wunsch
• opening and closing a gap can be punished
differentially –> Needleman-Wunsch-Gotoh
• find best local alignment –> Smith-Waterman
• the exchange matrix has smaller punish values for more similar letters
• example: as d/t are both dental sounds or leucin and isoleucin have similar biophysical properties

Question 5

Q

Smith-Waterman algorithem

Answer

A

finding local (partial) optimal alignments
align shorter with larger sequences
changing from negative to positive view
finding maximal score
back tracking in the matrix from final score to starting point

Question 6

Q

Differences FASTA vs BLAST

Answer

A

FASTA: not so time consuming, first FAST
Algorithm
• FASTA and BLAST start with small good
alignments, try to extend, finally optimize best hits
• FASTA is derived from dot-plot
1) Identify common k-words (Nucleotides 6 letters, AA 2 letter)
2) Score dotplot diagonals
3) Rescore possibly by exchange matrix
4) Join regions over gaps, penalise for gaps
5) Dynamic programming to finalize alignments
➔ BLAST hat ein anderes Prinzip: Es wird zuerst nach der perfekten Übereinstimmung gesucht und dann nach verschieden langen anderen ähnlichen Stücken…

Basic Local Alignment Search Tool
compare single sequence to entire database of sequences
compare two sequences
much faster than FASTA
BLAST is based on Poisson and Extreme Value distributions
heuristic aproach (no brute force of all possible permutations)
wordsize: 3 AA or 11 nucleotides per default, similarity
gaps are not treated well
Poisson-distribution of score values ñ P-Value
E-value = P-value * Number of entries in the database

Question 7

Q

Alignment Significance

Answer

A

generate random scores
compute mean and sd from random scores
compute the deviation from the real to the random
Z-Score to E-score (probability of a Z-score)
E-value: 10e-6 signicant
E-value: 10e-3 might be …
E-value: > 10e-3 ignore …

Question 8

Q

FASTA Variants

Answer

A

Protein:
• protein-protein FASTA (fasta).
• protein-protein Smith-Waterman (ssearch).
• global protein-protein (Needleman-Wunsch)
(ggsearch)
Nucleotide:
• nucleotide-nucleotide (DNA/RNA fasta)
• ordered nucleotides vs nucleotide (fastm)
• unordered nucleotides vs nucleotide (fasts)

Question 9

Q

multiple sequence alignement

Answer

A

MSA is for comparing homologous sequences
• Homologs: gene related to a second gene by descent from a common ancestral DNA sequence
- Orthologs: genes in different species that evolved from a common ancestral gene by speciation, normally retain function
- Paralogs: genes related by duplication within a genome,
might acquire new functions

three or more biological sequences (protein or nucleic
acid) of similar length. From the output, homology can be
inferred and the evolutionary relationships between the sequences studied.
By contrast, Pairwise Sequence Alignment tools are used to
identify regions of similarity that may indicate functional,
structural and/or evolutionary relationships between two biological sequences.

Question 10

Q

Progressive Alignments

Answer

A

combining pairwise alignments by starting with most similar alignments
initial guided tree, adding more sequences
not garanteed to be globally optimal
errors at the beginning might propagate to the end
examples: ClustalW, MAFFT (fast but might give more errors), T-Coffee (slow but very accuarate)
state of the art: Clustal Omega
tradeoff between speed and accuracy …

Question 11

Q

Iterative Alignment Methods

Answer

A

similar to progressive methods
but might realign initial alignments
examples: MUSCLE, Dialign

Question 12

Q

Clustal Omega

Answer

A

Solves the problem of beeing fast and accurate.

Clustal Omega is a multiple sequence alignment program.
It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.