7. pw, DP Flashcards by Stevie Davies

What is the difference between similarity/identity and homology?

homology: all-or-nothing condition (homologous or not homologous)

similarity / identity: quantitative measure, can be eg 20%

How well did you know this?

Not at all

Perfectly

Can homology be observed?

cannot be observed or known, just inferred

How well did you know this?

Not at all

Perfectly

Comparative sequence analysis: starting with seq A and seq B, what kind of analysis can we do?

similarity / homology?

compute (optimal) alignment

How well did you know this?

Not at all

Perfectly

What does sytenic mean

(of genes) occurring on the same chromosome.

How well did you know this?

Not at all

Perfectly

What is a dotplot?

What signals do they give?

In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment.

It is a type of recurrence plot.

signal
- identity, similarity
- length of consecutive signals

How well did you know this?

Not at all

Perfectly

Define the pairwise sequence alignment (genes)

the comparison & arranging of two sequences by
* searching for pairwise matches and “good
mismatches” between their characters
* possibly inserting gaps in each sequence

How well did you know this?

Not at all

Perfectly

What do we need to consider when obtaining a scoring matrix?

observe trusted alignments of related proteins
- which residues are paired? (i.e., which substitutions have occurred?)
different values for sequences of different evolutionary divergence!
- different scoring matrices for further diverged sequences!

How well did you know this?

Not at all

Perfectly

Name two approaches for amino acid scoring matrices

What are their origins?

PAM (compiled by Margaret Dayhoff and her colleagues in the 1970s - very little data)
BLOSUM (Steven and Jorja Henikoff in 1992)

How well did you know this?

Not at all

Perfectly

PAM matrices

what are they based on?

what does PAM1 imply?

Point Accepted Mutation

based on observed amino acid substitutions in families of evolutionarily related proteins
PAM1 implies 1 substitution per 100 amino acid,accepted by the processes of natural selection

How well did you know this?

Not at all

Perfectly

PAM matrices

how do we get PAM250?

extrapolation of values for more distantly related proteins:
PAM250 = (PAM1)²⁵⁰

How well did you know this?

Not at all

Perfectly

PAM matrices

What are the guidelines for which PAM matrix to choose?

PAM250 for proteins of 20% identity
PAM120 for proteins of 40% identity
PAM60 for proteins of 60% identity

How well did you know this?

Not at all

Perfectly

BLOSUM matrices

what does BLOSUM stand for

What is it based on

BLOcks amino acid SUbstitution Matrices

based on local alignments of divergent sequences

How well did you know this?

Not at all

Perfectly

BLOSUM matrices

How do we get different BLOSUM matrices?

eg BLOSUM50?

different BLOSUM matrices are not extrapolated but based on observed alignments

eg BLOSUM50 matrix is derived from alignments of sequences that are 50% identical

How well did you know this?

Not at all

Perfectly

You want to compare two sequences that you believe may be distantly related.

How would you choose a BLOSUM matrix? a PAM matrix?

Choose a BLOSUM with a lower number

Choose a PAM with a higher number

(Maybe start with BLOSUM62 and then adjust)

How well did you know this?

Not at all

Perfectly

BLOSUM matrices

What are the guidelines for how to choose a BLOSUM matrix?

eg when would you choose BLOSUM50?

guideline: a BLOSUM matrix index should approximately match the percent identity of the sequences to be aligned

–> BLOSUM50 matrix is best used for sequences
that are 50% identical

How well did you know this?

Not at all

Perfectly

What is a good all purpose substitution matrix for proteins?

Study These Flashcards

BLOSUM62
- all purpose - whether sequences are conserved or divergent.
- best for testing - then change parameters according to results

Scores in substitution matrices:

What do they mean?
How are they calculated?

Study These Flashcards

which amino acids occur together in the alignment columns more often than expected by chance?

s(a, b) = log (p_ab)/(q_abq_b)

p_ab: observed frequency of residues a and b aligned
q_ab, q_b : frequencies of residues a and b

Explain affine gap penalties

Study These Flashcards

score depends on the length of the contiguous gap
gap opening penalty is larger : d
gap extension penalty is smaller: e

In which different ways can an alignment be ‘optimal’?

Which kind of optimality are we aiming for? Which can we actually achieve?

Study These Flashcards

functionally
structurally
evolutionary
algorithmically

Aim for evolutionary, but algorithmically is the only one we can really achieve.

What does it mean if an alignment is functionally optimal ?

Study These Flashcards

aligned residues have the same function
eg functional domains

What does it mean if an alignment is structurally optimal ?

Study These Flashcards

aligned residues play a similar role / are in corresponding positions in the 3D structure
eg hydrophobic residues

What does it mean if an alignment is evolutionary optimal ?

Study These Flashcards

aligned residues are homologous, i.e. share a
common ancestry

What does it mean if an alignment is algorithmically optimal ?

Study These Flashcards

the highest-scoring alignment for a given substitution model and gap penalties

What problem does dynamic programming solve for pairwise alignments?

Study These Flashcards

GOAL: optimal (highest-scoring) pairwise alignment

PROBLEM:
- As length of sequences increases, number of possible alignments increases exponentially!
- constructing and scoring all possible alignments and picking the best one is not an option!

What kind of problems is dynamic programming used for? What is the basic principle?

optimization * problems are broken into smaller, nested subproblems * solutions to subproblems are computed and stored - these are used to construct solutions to larger and larger portions of the original problem

How is DP applied to alignment?

build up the best alignment by using optimal alignments of smaller subsequences

What are 3 steps for DP in optimal pairwise alignment?

1. initialization: of score matrix 2. scoring: matrix fill (calculate alignment score) 3. traceback: and deduction of alignment

What was the original algorithm designed for sequence alignment? What kind of alignment did it do?

Needleman Wunsch for global

Which algorithm was designed for local alignments?

Smith Waterman (based on Needleman-Wunsch)

How does traceback work in local alignments?

local pairwise alignment - cells with negative scores are set to zero - traceback starts at the highest scoring cell - stops when 0 is encountered

What is the consequence of affine gap penalties when using DP?

consequence for dynamic programming implementation: have to keep track of 3 scores and pointers at each cell

What is the effect of increasing the word size when generating dotplots?

reduces the noise, as short matches are removed. However it also reduces the signal for the areas that appear homologous.

What program from which package can be used for generating dotplots? What is this useful for ?

polydot or dotmatcher (more sensitive, uses scoring matrix) from EMBOSS - European Molecular Biology Open Software Suite Good way to get an overview of similarities of sequences

How can you get a dotplot? What parameters are there?

Combine (concatenate) sequences in one fasta file. Use polydot. Parameters: - word size - type of output Use dotmatcher. Parameters: - window size - threshold size - scoring matrix

EXAM QUESTION Sequence A and B have a length of 1000aa. Seq A has N-terminal region (front), with high similarity to a tandemly duplicated region in the middle of sequence B. Draw a dotplot presenting the similarities. (2019) First 250 amino acids are tandem duplicated in middle of B (2020) Dotplot 2 sequences (2020)

7. pw, DP Flashcards

(35 cards)