Lecture 4 - Bioinformatics Flashcards
Name 7 different types of RNA
mRNA, rRNA, tRNA, snRNA (splicesosome during post-transcriptional modification), microRNA, siRNA, and snoRNA.
Why isn’t there a charge in the links between polypeptides?
The carboxyl and amino group cancel each other out and have partial double bond characteristics
Note: This is Not the case around the alpha carbon of the amino group, so there will be rotational spinning around that region
Define a Homolog
When used in regard to a sequence, homology is an overarching term to describe their similarity or sameness. Divided into two subgroups: Orthologs and Paralogs
State the difference between orthologs and paralogs, and give an example of both
- Homology that perform identical or very similar functions in different species are called Orthologs
- Example of ortholog: Digestive RNAse enzyme seen in both humans, and cows
- Homology that perform different functions within one species are called Paralogs
- Example of Paralogs: Human Digestive RNAse enzyme and it’s homology with human Angiotenin, which stimulates blood vessel growth
What is more accurate to measure homology, DNA sequences, or Protein Sequences?
Protein sequences have 20 amino acid structures to choose from, DNA sequences have 4 possible bases. Homology in a Protein structure will obviously be much more telling.
What is the differences and similarities between myoblobin and hemoglobin (pertaining to protein structure)?
Both proteins contain heme group, but hemoglobin has 4 tetramers compared to myoblobin’s singular structure. The alpha structure of hemoglobin is regarded to be similar to myoglobin
Describe the process of sliding sequence comparisons?
Protein sequence chains are compared by matching the two sequences alongside each other, and “sliding” them along each other looking for similarities. (Hemoglobin and myoglobin have a sameness score of 22-23)
Gaps are experimentally accounted for (caused from mutations, deletions, insertions, etc…) with the software to find maximum sameness. However excessive gaps increase the potential homologs “Gap Penalty Score”.
How does “shuffling” enhance the likelihood of finding a matching homolog.
Alignment scores are scored based on the sequence’s similarities, while points are deducted based on the number of, and length of inserted gap sequences.
When two sequences are “shuffled” almost all possible gap sequences are compared. The hopefully single sequence with the highest alignment score is most likely to be significant.
What is the difference between a conservative substitution and a nonconservative one?
A conservative substitution is an amino acid replacement with another of similar size and properties. Subsequent changes are expected to be minor.
A nonconservative substitution is a more structurally dissimilar amino acid substitution with potentially more radical changes and effects.
State what a substitution matrix does and discuss characteristics of the most popular type
A substitution matrix is an algorithm for scoring the conservation of one amino acid sequence with another. Dissimilarities between two amino acids in similar sequences grants a certain score, more positive for conservative changes, and more negative for non-conservative changes. The most popular algorithm is called Blosum-62.
- Adding single-spaced gaps reduce score by 12, and each additional space in the gap decreases the score by an extra 2 points per space.
Between Blosum-62 and the sliding/shuffling common identities method, which method is more accurate
Generally Blosum-62 is considered more accurate, not to mention it can trace more differentiated amino sequences to a possible common ancestor.
What is the general rule of thumb to compare whether or not two sequence’s homologies are related to chance?
According to nobody in particular, if a sequence of over 100 AAs long is over 25% similar to another sequence, it’s very likely homologous.
Under 15%? - Probably not
Between 15%-25% - Uncertain. Further analysis is needed.
In a BLAST search result, where is the identifying number of the topic sequence, and the homolog being searched for
In the comparison, what do the symbols between spaces signify?
- After the gi|######## is the identifier for the Query (searched sequence), and the gi|###### is the identifier number for the Subject sequence (Search result/possible comparison)
In the gap between the query and subject, a letter signifies an identical sequence, whereas a (+) sign indicates a highly conserved substitution
Which aspect of protein structure is most significant in conservation?
Tertiary structure is way more evolutionarily conserved than primary (tertiary is also more responsible for function than primary).
For example myoglobin, hemoglobin, and leghemoglobin are almost identically folded, but are less than 15% conserved in their differing primary structure.
Hsp70 and actin are also almost identically folded in their secondary and tertiary structure, but have different biochemical functions.
- Because of this, it’s easy to place proteins into “classes” based on their tertiary folding pattern
What is another example of a group of enzymes with similar function, but different folding patterns?
Chymotrypsin and Subtillsin both utilize his, ser, asp in their cleavage of sulfide bonds, but despite this freakish similarity, their folding patterns and differing sequences make a recent common ancestor unlikely.