Sequence Analysis Flashcards

1
Q

Alignment Based Methods

A
  • Goal: find best alignment.
  • Measure/Score: As few as possible introduction of gaps and substitutions.
  • Question: How to achieve this?
  • Approach: Pairwise vs multiple sequence alignment.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Edit distance invented by Levenshtein (1965)

A

Jeweils eine Änderung (Hinzufügen/Löschen eines Buchstaben, verändern…) = Distance +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Damerau

A

• flip operations are one change
• brid (old english) ñ bird (new english) ñ 1
operation
• mistyping as “ebya”is more easily recognized by search engines in the web
• used as well in biology, spell checking, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Global Aligment “Needleman-Wunsch”

A

• gaps can get different scoring points than edits
• exchange matrix for different letter changes
• find global alignment –> Needleman-Wunsch
• opening and closing a gap can be punished
differentially –> Needleman-Wunsch-Gotoh
• find best local alignment –> Smith-Waterman
• the exchange matrix has smaller punish values for more similar letters
• example: as d/t are both dental sounds or leucin and isoleucin have similar biophysical properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Smith-Waterman algorithem

A
  • finding local (partial) optimal alignments
  • align shorter with larger sequences
  • changing from negative to positive view
  • finding maximal score
  • back tracking in the matrix from final score to starting point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Differences FASTA vs BLAST

A

FASTA: not so time consuming, first FAST
Algorithm
• FASTA and BLAST start with small good
alignments, try to extend, finally optimize best hits
• FASTA is derived from dot-plot
1) Identify common k-words (Nucleotides 6 letters, AA 2 letter)
2) Score dotplot diagonals
3) Rescore possibly by exchange matrix
4) Join regions over gaps, penalise for gaps
5) Dynamic programming to finalize alignments
➔ BLAST hat ein anderes Prinzip: Es wird zuerst nach der perfekten Übereinstimmung gesucht und dann nach verschieden langen anderen ähnlichen Stücken…

  • Basic Local Alignment Search Tool
  • compare single sequence to entire database of sequences
  • compare two sequences
  • much faster than FASTA
  • BLAST is based on Poisson and Extreme Value distributions
  • heuristic aproach (no brute force of all possible permutations)
  • wordsize: 3 AA or 11 nucleotides per default, similarity
  • gaps are not treated well
  • Poisson-distribution of score values ñ P-Value
  • E-value = P-value * Number of entries in the database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Alignment Significance

A
  • generate random scores
  • compute mean and sd from random scores
  • compute the deviation from the real to the random
  • Z-Score to E-score (probability of a Z-score)
  • E-value: 10e-6 signicant
  • E-value: 10e-3 might be …
  • E-value: > 10e-3 ignore …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FASTA Variants

A

Protein:
• protein-protein FASTA (fasta).
• protein-protein Smith-Waterman (ssearch).
• global protein-protein (Needleman-Wunsch)
(ggsearch)
Nucleotide:
• nucleotide-nucleotide (DNA/RNA fasta)
• ordered nucleotides vs nucleotide (fastm)
• unordered nucleotides vs nucleotide (fasts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

multiple sequence alignement

A

MSA is for comparing homologous sequences
• Homologs: gene related to a second gene by descent from a common ancestral DNA sequence
- Orthologs: genes in different species that evolved from a common ancestral gene by speciation, normally retain function
- Paralogs: genes related by duplication within a genome,
might acquire new functions

three or more biological sequences (protein or nucleic
acid) of similar length. From the output, homology can be
inferred and the evolutionary relationships between the sequences studied.
By contrast, Pairwise Sequence Alignment tools are used to
identify regions of similarity that may indicate functional,
structural and/or evolutionary relationships between two biological sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Progressive Alignments

A
  • combining pairwise alignments by starting with most similar alignments
  • initial guided tree, adding more sequences
  • not garanteed to be globally optimal
  • errors at the beginning might propagate to the end
  • examples: ClustalW, MAFFT (fast but might give more errors), T-Coffee (slow but very accuarate)
  • state of the art: Clustal Omega
  • tradeoff between speed and accuracy …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Iterative Alignment Methods

A
  • similar to progressive methods
  • but might realign initial alignments
  • examples: MUSCLE, Dialign
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Clustal Omega

A

Solves the problem of beeing fast and accurate.

Clustal Omega is a multiple sequence alignment program.
It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly