Bioinformatics 7: Sequence alignment and its significance Flashcards

Question 1

Q

The 2 types of homolog and their differences?

Answer

A

ortholog - separated via speciation event

paralog - separated via duplication event

Question 2

Q

What is meant by the orthology conjecture?

Answer

A

Orthologs are more likely to show more functional conservation that paralogs

i.e. ortholog genes usually related in function
paralog genes duplicate and diverge (as 2 copies of gene, same function)

Question 3

Q

What is meant by ‘chance similarity’?

Answer

A

Any two sequences that show similarity by chance

not structurally or functionally

Question 4

Q

2 ways dna sequences might differ?

Answer

A

Mismatches
Gaps

created by substitutions and indels

Question 5

Q

What is a dotplot in the context of allignment?

Answer

A

Matrix of 2 sequences marked where rows and colums match

Used by alignment algorithms to find most likely evolutionary pathway between the 2 sequences

Question 6

Q

Which is more common - indels or substitutions? How does this affect alignment?

Answer

A

Substitutions far more common than indels
-> must be considered in alignment algorithms

thus ‘quality’ of alignments is assessed via a scoring matrix (matches +ve, mismatches 0, gaps -ve) -> algorithms maximise score

Question 7

Q

Types of gap penalty?

Answer

A

Constant
Proportional
Affine

Question 8

Q

How would a penalty applied to an amino acid substitution vary in severity?

Answer

A

If amino acid which has been substituted is similar in chemical properties (function) -> low penalty

If completely different, likely to be deleterious -> high penalty

Question 9

Q

How do heuristic algorithms work and why are they used over dynamic programming algorithms? Example of one?

Answer

A

Heuristic methods assume high scoring alignments contain short regions of exact matches

> they break queries into short ‘words’ and look for matches above a threshold
> initial hits examined to see if they can be extended
> alignment then scored to quantify similarity
e. g. BLAST

Question 10

Q

How does BLAST work?

Answer

A

Basic local alignment search tool

word (W) size: 3 (proteins), 11 (DNA)
-> searches only for word matches above threshold, T

> matches above T extended (form HSPs) until gaps cause alignment score to fall drastically
> neighbouring HSPs are joined, HSPs in low identity regions are not joined

High-scoring segment pairs (HSPs) along query are reported + ordered by score

Question 11

Q

Types of blast searches and their uses?

Answer

A

blastn: nucleotide query vs nucleotide db (what gene is this?)
blastp: protein query vs protein db (what protein is this?)
blastx: translated nucleotide query vs protein db (does this DNA code for a known protein?)
tblastn: protein query vs translated nucleotide db (what DNA might encode this protein)
tblastx: translated nucleotide query vs translated nucleotide db (does this DNA code for a novel protein?)

Question 12

Q

what is FASTA? how does it compare to BLAST?

Answer

A

Heuristic algorithm
AND sequence format (single line of description followed by sequence data)

more sensitive to distant relationships but slower than BLAST

Question 13

Q

In the context of a search result, what is a P value and an E value?

Answer

A

P value = probability of observing as high of an alignment score between 2 unrelated sequences of the same length + composition

E (Expect) Value = How often a match would be expected to occur in a db by chance (at a given p value)

Bioinformatics 7: Sequence alignment and its significance Flashcards

(13 cards)