Genome sequencing and sequence analysis II Flashcards
What’s the core information contained in protein data banks?
The X, Y and Z coordinates for the atoms in proteins.
There are 150 000 proteins logged, many of them have similar structures.
Define sequence similarity.
- Sequence similarity is the number of identical and similar residues.
- “Similar” residues have similar chemical properties, and thus, interacts in the same way.
- Similarity is the product of sequence invariance and sequence (functional!) conservation.
Define sequence homology.
Two sequences are homologous if it can be concluded that they derive from the same evolutionary past.
Define sequence identity.
The extent to which two sequences are invariant (same nt).
Define sequence conservation.
The sequences may have different nucleic bases, however, the amino acids have similar chemical interactions, and thus, the functionality is conserved.
What’s the general threshold of DNA sequence similarity required to claim that two proteins share the same general structure?
20-30%
Form gives rise to function. The proteins may very well differ at a greater level of detail.
Why are global alignments not necessarily positive?
Global alignments look at sequence similarity over the entire sequence. Many times, DNA sequences (which correspond to different protein domains) have been shuffled around, still creating the same product, but in a different order. These re-shufflings don’t change the protein functionality, only the DNA sequence.
Why is it logical to penalise the alignment score of 1nt gaps proportionally more than nucleotides in a larger gap?
First nt is penalized harder.
TATCTAAA
vs
TATC**AT
It makes sense as the 1nt gap likely has the same functional consequence as a longer gap.
What’s the concept behind amino acid substitution matrixes?
In a matrix, you can read the conservation of a protein. This is achieved by chemically comparing them, seeing if they can chemically interact in similar manners. If not, the penalty is high. If yes, then the score may still be positive.
What do you use Blastn for?
DNA –> DNA search
What do you use plastp for?
protein –> Protein search
What do you use blastx for?
DNA –> Protein search.
What do you use tblastn for?
Protein –> DNA search.
What do you use tblastx for?
DNA –> protein –> DNA search.
Explain the blast search strategy (protein-protein search).
- Make a list of k-letter words (words with 2,3 or 6 aas).
- Set a word score threshold (the threshold for what alignment score is needed for a aword to be counted as being aligned).
- Search for your words in other protein sequences.
- When BLAST identifies a word-match, the search extends bilaterally until the alignment score drops below a certain level. (90% of BLAST processing is spent here).