Lecture 2 DA Flashcards
What is a blast search?
A sequence alignment.
What is checked for in a protein when a sequence is found (4)?
Whether the protein is:
- known, with known function
- known, but with unknown function
- novel, but with known similar sequences of known function
- novel, with no known similar sequences
Does high similarity necessitate actual similarity? Why/why not?
No, because they could be similar due to convergent evolution.
What is homology?
Descent from a common ancestor.
What is orthology?
Descent from speciation event.
What is paralogy?
Descent from duplication event.
What is xenology?
Descent from horizontal transfer event.
What percentage of identical amino acids is required for two proteins to have similar folding patterns? What is the most likely relation between them, and what does this depend on?
> 25%. They’d most likely be homologous. Depends highly on e-value.
What percentage similarity of identical amino acids between two proteins is the twilight zone for determining similar folding patterns?
18-25%.
What percentage similarity of identical amino acids between two proteins means similar folding patterns by a sequence alignment cannot be determined?
What does the e-value represent?
Measure of the chance of obtaining the result by random.
What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?
Protein, as it involves 20 characters, vs 4.
What does the e-value represent?
Measure of the chance of obtaining the result by random.
What is more informative, amino acid/primary sequence alignment, or a nucleic acid alignment?
Protein, as it involves 20 characters, vs 4.
What is rarer: insertion of a gap, or the exchange of a nucleotide? What consequence does this have on the scoring system?
Insertion of a gap. Therefore, a bigger penalty on inserting gaps.
What is the difference between local and global alignment?
Global - aligns all of two sequences, finding global/overall similarity.
Local - looks for regions of similarity, rather than a complete alignment.
What are the three steps of dynamic programming?
Initialisation
Scoring the matrix
Traceback
What does initialisation in dynamic programming involve?
0 is put to the end of all sequences.
Describe the affine gap penalty in dynamic programming?
Large penalty for opening a gap, smaller penalty for extending one.
What are substitution matrices developed from?
From direct observation of homologous sequences.
What is a PAM? How regularly do they occur?
Point accepted mutation.
1 per 100 amino acids.
What is a BLOSUM62 matrix derived from? What is its percentage identity?
Derived from BLOCKS protein families.
Derived from proteins that have no more than 62% identity.
What is the use of a BLOSUM62 matrix?
Default matrix for BLAST.
Describe BLASTN scoring.
+1 match, -1 mismatch
-5 open gap, -2 extend gap
What is the formula for the e-value?
e=M/2^s
M=mn, m is the length of the query, n is the total number of residues in the database.
s
is the score
What does the e-value formula imply about the e-value?
It is the expected frequency of score s` in a database search, or the chance of getting this score by random.
What does it mean if the e-value is 1?
Probability of 1 false mismatch in the search.
What e-value is preferred? What is an interesting e-value, and what do e-values depend on?
Closer to 0, the better.
Less than 1e-4 is interesting.
e-values are search dependent.