Structure bioinformatics - computational approaches Flashcards
Describe the RMSD metric
Mean Square Root Deviation is a metric to see how similar a fold prediction is to a template fold.
When you align two structures you look at the difference in distance between each alpha carbon in the backbone between template and prediction and take the average of that.
The metric is an average it is dependent on size.
It is global so alignment is necessary
A good value is less than 2Å.
Describe the TM metric
Template modeling score is a metric that describes how close a fold prediction is to a template structure.
TM is like RMSD but focuses on the parts of the structures that overlap which makes it size-independent. TM score of 1Å is perfect.
It is a global metric so alignments are necessary.
Describe the GDT-TS metric
Global Distance Test Total Score is a metric that describes how close a fold prediction is to a fold template.
GDT-TS is a global score that measures the percentage of residues in the predicted structure that fall within a certain distance threshold of the corresponding residues in the experimental structure. GDT-TS is often expressed as a percentage.
GDT of 90 is considered to be competitive to experimental methods.
Describe the lDDT metric
local distance difference test. This is the metric that alphafold uses.
lDDT is a local quality assessment score that evaluates the accuracy of predicted local structures. It measures the difference in distances between corresponding atoms in the predicted and experimental structures, normalized by the optimal distance.
Alignments are not required since the metric is superposition-free.
What is CASP?
The CASP competition is a biennial event that evaluates the accuracy of computational
methods in predicting protein structures.
What are Ab Initio models?
Computational method for fold prediction.
The method takes the aa sequence and looks at the physiochemical properties and then decides on the energy-favorable fold since the fold which needs the least energy is the most thermodynamically stable.
Ab initio modeling is good if you cannot find a template at all or one with high sequence identity. But if you have a good template homology modeling is a much better method.
These methods do not use templates.
Describe the homology modeling method
A method to predict the fold of a protein sequence.
- Identify related template structures (structures from PDB)
- Look for a template with high sequence identity.
- align target sequence to template structure
*Build a model for the target sequence using info from template structure - Evaluate.
Based on the alignment, transfer coordinates for conserved backbone regions from template to target. Also copy coordinates for conserved side chains.
This method assumes that structure is more conservative than sequence and identical spatial coordinates for conserved regions.
Can you do homology modeling if the sequence identity between prediction and template is low?
No. The modeling accuracy is dependent on high sequence identity (preferably 30-40% identity to get out of the twilight zone). The alignment is important.
The lower the identity the more outliers you will see.
In homology modeling, how do you find your templates?
You do alignments to see if any of the sequences in a database that has a structure is similar to your sequence.
How do we model for insertions and deletions in homology modeling?
If you in the alignment find a insertion we fit it in a loop in the structure since in homology modeling insertions and deletions are assumed to be in the loops. This because loops have very high variation and are not very conservational.
If the insertion is long then it gets hard to model for.
How do we model for long insertions/deletions?
Make a database with loop fragments from PDB and look for one that has:
* The lengths of the insertion (number of residues)
* Goes from point A to point B
* Has high sequence identity.
We assume that long insertions are in loops but we do not really know. If the insertion is too long then you can’t model it with homology modeling because of the lack of templates.
How do we model for the side chains in homology modeling?
If the side chain is conserved, coordinates are copied: works well in 90% of all cases. If the side chains have overlapping atoms, this also works.
What GDT did AlphaFold have in CASP13 response CASP14?
CASP13 = around 60 (AF1)
CASP14 = around 90 (AF2)
What are neural networks and deep learning?
Neural networks are computational models inspired by the structure and function of the human brain. They are a fundamental component of machine learning and artificial intelligence.
When a neural network has multiple hidden layers, it is referred to as a deep neural network. Deep learning involves training these deep networks.
In deep neural networks, explain the following terms:
- Nodes/neurons
- Weights
- Activation function
Nodes/neurons = Each neuron takes one or more inputs, processes them using a set of weights, and produces an output. The output is typically passed through an activation function.
Weights = Weights are parameters associated with the connections between neurons. They represent the strength of the connections. During training, these weights are adjusted based on the error in the network’s predictions.
Activation function = The activation function determines the output of a neuron given its input with mathematical operations . It introduces non-linearity (input and output can have other relationships than linear ones) into the network, allowing it to learn complex patterns.
In deep neural networks, explain the following terms:
- layers
- hidden layers
- transformers.
layers = Neurons are organized into layers. A neural network typically consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the initial data, and the output layer produces the final predictions or decisions.
hidden layers = In a neural network, the term “hidden layer” refers to layers of neurons that come between the input layer and the output layer. These layers are called “hidden” because they do not directly interact with the external environment (input or output), and their activations are not observed in the final output.
transformer = Transformers are neural networks that add context to the input data. The representation of the input changes depending on the rest of the input.
Explain fold recognition (threading)
Unlike homology modeling threading is not based on sequence identity.
The method makes the assumption that there are a limited amount of folds.
You take your sequence and thread it onto the different fold possibilities and try to find the best fit and then compare comparability scores.
If the sequence fits the template is decided by if there are many reasonable bonds, clashes, hydrophobic cores ect. Basically if the fold looks like a protein.
What are the assumptions that homology modeling makes?
Structure is more conserved than sequence.
Identical spatial coordinates for conserved regions (the ones with high sequence identity).
Insertions and deletions are assumed to mainly be in loop regions.
In homology modeling, what should we do for regions that are difficult to align?
A correct alignment is critical for modeling accuracy so regions that are hard to align should be analysed visually. We should visually:
make sure that there are no insertions or deletions in secondary structures.
Use multiple sequences of homologous proteins to improve the alignment manually.
Use biological information, make sure that binding site or active site residues are aligned properly.
Describe all of the steps in homology modeling.
identify templates from PDB.
Do an alignment to see if any of the sequences with a structure has high sequence identity to your target.
transfer coordinates for conserved backbone from template to target. Also copy coordinates for conserved side chains.
model loop regions with insertions/deletions.
add non-conserved side-chains.
minimize the model
Evaluate quality
What are the typical errors in homology modeling?
Incorrect prediction of interactions between side chains (hydrogen bonds, salt bridges ect.)
Loops can be incorrectly modeled due to lack. of templates from loop database.
misalignments. Homology modeling cannot recover from errors in alignment
Bad selection of templates.
Give an example of an ab intio method
The rosetta program.
Rosetta can perform ab initio structure prediction, where the three-dimensional structure of a protein is predicted without using homologous structures as templates. It achieves this by sampling many small peptide fragments and assembles a large number of predicted 3D structures. The program then optimizes the structure based on the energy function.