Structure bioinformatics - computational approaches Flashcards

1
Q

Describe the RMSD metric

A

Mean Square Root Deviation is a metric to see how similar a fold prediction is to a template fold.

When you align two structures you look at the difference in distance between each alpha carbon in the backbone between template and prediction and take the average of that.

The metric is an average it is dependent on size.
It is global so alignment is necessary

A good value is less than 2Å.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the TM metric

A

Template modeling score is a metric that describes how close a fold prediction is to a template structure.

TM is like RMSD but focuses on the parts of the structures that overlap which makes it size-independent. TM score of 1Å is perfect.

It is a global metric so alignments are necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the GDT-TS metric

A

Global Distance Test Total Score is a metric that describes how close a fold prediction is to a fold template.

GDT-TS is a global score that measures the percentage of residues in the predicted structure that fall within a certain distance threshold of the corresponding residues in the experimental structure. GDT-TS is often expressed as a percentage.

GDT of 90 is considered to be competitive to experimental methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the lDDT metric

A

local distance difference test. This is the metric that alphafold uses.

lDDT is a local quality assessment score that evaluates the accuracy of predicted local structures. It measures the difference in distances between corresponding atoms in the predicted and experimental structures, normalized by the optimal distance.

Alignments are not required since the metric is superposition-free.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is CASP?

A

The CASP competition is a biennial event that evaluates the accuracy of computational
methods in predicting protein structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Ab Initio models?

A

Computational method for fold prediction.

The method takes the aa sequence and looks at the physiochemical properties and then decides on the energy-favorable fold since the fold which needs the least energy is the most thermodynamically stable.

Ab initio modeling is good if you cannot find a template at all or one with high sequence identity. But if you have a good template homology modeling is a much better method.

These methods do not use templates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the homology modeling method

A

A method to predict the fold of a protein sequence.

  • Identify related template structures (structures from PDB)
  • Look for a template with high sequence identity.
  • align target sequence to template structure
    *Build a model for the target sequence using info from template structure
  • Evaluate.

Based on the alignment, transfer coordinates for conserved backbone regions from template to target. Also copy coordinates for conserved side chains.

This method assumes that structure is more conservative than sequence and identical spatial coordinates for conserved regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you do homology modeling if the sequence identity between prediction and template is low?

A

No. The modeling accuracy is dependent on high sequence identity (preferably 30-40% identity to get out of the twilight zone). The alignment is important.

The lower the identity the more outliers you will see.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In homology modeling, how do you find your templates?

A

You do alignments to see if any of the sequences in a database that has a structure is similar to your sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we model for insertions and deletions in homology modeling?

A

If you in the alignment find a insertion we fit it in a loop in the structure since in homology modeling insertions and deletions are assumed to be in the loops. This because loops have very high variation and are not very conservational.

If the insertion is long then it gets hard to model for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we model for long insertions/deletions?

A

Make a database with loop fragments from PDB and look for one that has:
* The lengths of the insertion (number of residues)
* Goes from point A to point B
* Has high sequence identity.

We assume that long insertions are in loops but we do not really know. If the insertion is too long then you can’t model it with homology modeling because of the lack of templates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we model for the side chains in homology modeling?

A

If the side chain is conserved, coordinates are copied: works well in 90% of all cases. If the side chains have overlapping atoms, this also works.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What GDT did AlphaFold have in CASP13 response CASP14?

A

CASP13 = around 60 (AF1)
CASP14 = around 90 (AF2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are neural networks and deep learning?

A

Neural networks are computational models inspired by the structure and function of the human brain. They are a fundamental component of machine learning and artificial intelligence.

When a neural network has multiple hidden layers, it is referred to as a deep neural network. Deep learning involves training these deep networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In deep neural networks, explain the following terms:
- Nodes/neurons
- Weights
- Activation function

A

Nodes/neurons = Each neuron takes one or more inputs, processes them using a set of weights, and produces an output. The output is typically passed through an activation function.

Weights = Weights are parameters associated with the connections between neurons. They represent the strength of the connections. During training, these weights are adjusted based on the error in the network’s predictions.

Activation function = The activation function determines the output of a neuron given its input with mathematical operations . It introduces non-linearity (input and output can have other relationships than linear ones) into the network, allowing it to learn complex patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In deep neural networks, explain the following terms:
- layers
- hidden layers
- transformers.

A

layers = Neurons are organized into layers. A neural network typically consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the initial data, and the output layer produces the final predictions or decisions.

hidden layers = In a neural network, the term “hidden layer” refers to layers of neurons that come between the input layer and the output layer. These layers are called “hidden” because they do not directly interact with the external environment (input or output), and their activations are not observed in the final output.

transformer = Transformers are neural networks that add context to the input data. The representation of the input changes depending on the rest of the input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain fold recognition (threading)

A

Unlike homology modeling threading is not based on sequence identity.

The method makes the assumption that there are a limited amount of folds.

You take your sequence and thread it onto the different fold possibilities and try to find the best fit and then compare comparability scores.

If the sequence fits the template is decided by if there are many reasonable bonds, clashes, hydrophobic cores ect. Basically if the fold looks like a protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the assumptions that homology modeling makes?

A

Structure is more conserved than sequence.

Identical spatial coordinates for conserved regions (the ones with high sequence identity).

Insertions and deletions are assumed to mainly be in loop regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In homology modeling, what should we do for regions that are difficult to align?

A

A correct alignment is critical for modeling accuracy so regions that are hard to align should be analysed visually. We should visually:

make sure that there are no insertions or deletions in secondary structures.

Use multiple sequences of homologous proteins to improve the alignment manually.

Use biological information, make sure that binding site or active site residues are aligned properly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe all of the steps in homology modeling.

A

identify templates from PDB.

Do an alignment to see if any of the sequences with a structure has high sequence identity to your target.

transfer coordinates for conserved backbone from template to target. Also copy coordinates for conserved side chains.

model loop regions with insertions/deletions.

add non-conserved side-chains.

minimize the model

Evaluate quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the typical errors in homology modeling?

A

Incorrect prediction of interactions between side chains (hydrogen bonds, salt bridges ect.)

Loops can be incorrectly modeled due to lack. of templates from loop database.

misalignments. Homology modeling cannot recover from errors in alignment

Bad selection of templates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Give an example of an ab intio method

A

The rosetta program.

Rosetta can perform ab initio structure prediction, where the three-dimensional structure of a protein is predicted without using homologous structures as templates. It achieves this by sampling many small peptide fragments and assembles a large number of predicted 3D structures. The program then optimizes the structure based on the energy function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do we need molecular dynamics?

A

We have no way of finding equations to describe motion if we have three or more interacting bodies (like molecules or atoms).

In all cases we can do numerical approximations with for example Euler’s method (simulations) which are done with molecular dynamics. E.g. simulate the motion y breaking it down to many timesteps and calculate the position of the body at each step.

24
Q

When making numerical approximations of motion, do we want small or large time steps and why?

A

It is crucial to use small time steps for higher accuracy.

The timesteps are usually defined by the fastest vibration of the system. These are typically bonds involving hydrogens.

25
Q

To simulate molecular trajectories, what do we need to define?

A

The forces: Related to how the atoms are interacting with each other and how they are influencing each others movement.

The velocities: Temperature related, related to how fast the atoms are moving.

26
Q

What parameters do you look at when trying to define a force field to be used in a molecular dynamics simulation of a system?

A
  • Non-bonding interactions
  • Bonded-terms
  • Angle terms
  • Torsion terms (dihedral angles).
  • Improper torsional terms

We want all of these parameters to be close to energy equilibrium to make the force and energy around the atoms favorable. We penalize distance and angles outside of equilibrium.

27
Q

When we want to define a force field for a molecular dynamics simulation we look at the parameter non-bonded terms. Amongst these are Van der Waals interactions.

Explain what they are and why they are important.

A

Van der Waals interactions:
When atoms come close enough so that the outer electron layer of the atoms can touch, Van derivative Waals interactions can be formed.

These interactions are distance dependent, if they get too close they repulse each other and if they are too far apart the interaction is 0.

These interactions are crucial in modeling the forces that work on a system, we want these as close to equilibrium as possible.

When modeling the forces we use Lennard-Jones potential to model the forces as a function of the distance between the atoms.

28
Q

When we want to define a force field for a molecular dynamics simulation we look at the parameter non-bonded terms. Amongst these are Electrostatic interactions.

Explain what they are and why they are important.

A

Electrostatic interaction refers to the attractive or
repulsive forces between electrically charged particles defined by Coulomb potential and calculated using partial charges.

These are modeled together with Van der Waals interactions with the purpose of understanding the forces on a system that are due to non-bonding interactions.

To reproduce the electrostatic interactions we need to select partial charges to all individual atoms.

29
Q

Which are the bonded terms used to define the force field that work on a system?

A
  • Angle terms
  • Torsion terms (dihedral angles)
  • Improper torsional terms
30
Q

Describe bond-distance as a bonded-term parameter of defining the force field of a system?

A

The energy equilibrium for distance between atoms is the length of the bond.

We penalize all distances smaller or greater than this since it will increase the forces which will make the energy less favorable.

31
Q

Describe angle-terms as a bonded-term parameter of defining the force field of a system?

A

there is an equilibrium angle and you penalize if you are not at equilibrium.

These are less rigid than bonds meaning that they can move more without the energy becoming to unfavorable. We need 3 atoms for an angle.

32
Q

Describe torsion term (dihedral angles) as a bonded-term parameter of defining the force field of a system?

A

The bonds can move around the axis and this will create different angles between planes.

You need 4 atoms in a row to define these angles.

33
Q

Do we always have bonded and non-bonded terms in the same molecule?

A

Usually if you have bonds and angles then you do not have non-bonded interactions between these atoms.

In a bigger molecule, atoms further from each other than 4 connections can have non-bonded interactions and you should compute the energies for these by looking at the distance between them.

There is however a distance cutoff where you assume that the non-bonded interactions are 0.

34
Q

What does this represent?

Upot = kb (r - r0)^2

A

It represents the forces that come from the bond term (the bond vibrations influenced by how much the bond is stretched)

35
Q

What does this represent?

Upot = (Aij / rij^12 - Bij / rij^6)

A

It’s the Lennerd-Jones potential that describes the forces that come from the Van der Waals interactions between atoms.

36
Q

What does this represent?

Upot = qiqj / rij

A

It represents Coulombs law that describes the forces on a system that come from electrostatic interactions.

37
Q

What is minimization in molecular dynamics? why do we need it?

A

Minimization refers to the process of finding the configuration of a molecular system where the potential energy is minimized.

The potential energy landscape represents the energy of a system as a function of the coordinates of its atoms, and the goal of minimization is to identify the lowest-energy state or a local minimum on this landscape.

Energy minimization is a crucial step in molecular dynamics simulations and structure optimization. It is used to relax the initial atomic coordinates, remove steric clashes, and find stable conformations.

Ligand binding also happens at low energies.

38
Q

What is local minimum when you do minimization in molecular dynamics?

A

A local minimum on a potential energy landscape is a configuration of atomic coordinates where the potential energy is lower than in the immediate surrounding region. It’s a point where the system is stable with respect to small variations in atomic positions.

In a simulation of a bigger molecule we would probably find different local minima if we run the simulation several times because the atoms move around and the energy landscape changes which will give different local minima.

39
Q

What is global minimum when you do minimization in molecular dynamics?

A

The global minimum is the lowest point on the entire potential energy landscape. It represents the most stable configuration of the system among all possible atomic arrangements. Finding the global minimum is essential for understanding the most energetically favorable state of the system.

We would like to find the global minima but usually it is only possible to find the local minima. We can only find global minima if we look at the whole systems with molecular dynamics.

40
Q

To create molecular dynamics simulations, what do we need and how is it performed?

A

Input data:
- The force field of the system.
- Coordinates of the atoms

The coordinates are used by the force field to get the energy landscapes and from the force field we can also compute the forces on each atom giving us their accelerations.

Velocities are used to control the temperature of the system according to the thermostat algorithm.

Minimize the system energy to find the local minimum and avoid potential clashes between atoms.

Run simulation and compute trajectories.

Compute the free energies.

41
Q

Why do we need to compute the free energies for molecular dynamics simulations?

A

Free energy drives physical processes like protein folding and ligand binding and describes spontaneous processes.

High free energy would mean that the model is not very good because ligand binding and such happens at low energies.

42
Q

What are the periodic boundary conditions?

A

Molecular dynamics are performed in finite boxes and need to represent infinity.

Periodic boundary conditions (PBC) are a set of conditions used to simulate a system as if it were part of an infinity.

43
Q

Define the periodic boundary conditions.

A

PBC involves creating periodic replicas of the
simulation box in all three spatial dimensions (x, y,
and z). These replicas are essentially copies of the
original box.

If a particle crosses the boundary in the x-direction,
for example, it reappears on the other side of the
box in the same x-position but with the same
velocity.

44
Q

What is free energy? How do we compute it? What is the difference from potential energy?

A

The free energy is a sum of enthalpy and entropy. The enthalpic contribution depends on the interactions in your system and the entropy of the order of your system. Disorder = high entropy.

Potential energy is the energy stored in bonds ect.

45
Q

What is the equilibrium phase of a molecular dynamics simulation?

A

When you have minimized your system you run multiple short MD simulations as you are increasing the temperature to reach lab conditions.

46
Q

What is the production phase of a molecular dynamics simulation?

A

When you lastly run the simulation and compute the trajectories.

47
Q

Why is structure alignment (superimposing) necessary for the global metrics?

A

Because they look at the whole/general structure and therefore become sensitive to general orientation/position in space so we need to superimpose prediction and template before using the metrics.

48
Q

Explain the difference between TM score and RMSD score?

A

Both are metrics to see how similar a predicted fold is to a template fold. Both looks at the distances in the placement of alfa carbons in the backbone between template and structure.

However RMSD looks at the whole alignment wether the atoms are superimposed or not and gives equal weight to all distances by taking the average distance.

TM only looks at the overlapping parts of the superimposed structures and takes no average. It is designed to be size-independent

49
Q

What are the limitations of homology modeling, threading and alpha fold?

A

Homology modeling: Only works with high sequence identity and if the sequences with high identity has an experimental structure.

Threading: could be time-consuming to test all available folds so you have to choose a selection of them to test.

Alpha fold: world better for helices than sheets.

50
Q

What are the pros and cons of using molecular docking vs using molecular dynamics?

A

Molecular docking is a fast way to filter through many compounds and find the ones that are worth working with further. It is good to use as an initial screening method. However, the approximations that the algorithm makes to make it faster gives us lower accuracy. The docking algorithms are also not good at choosing the ligands with the highest affinity out of multiple ligands with known affinity. We need MD simulations for that.

Therefore, when you have found the initial compounds, the molecular simulations will give you a more realistic and accurate result of how the protein and ligand are going to work together since we can find the most energetically favorable state of the system and we don’t take as many short-cuts ect.

The molecular dynamics is to computationally heavy to use as a screening method.

Therefore in drug discovery, use molecular docking as a screening method and then look further at the results and make them better by using molecular dynamics simulations.

51
Q

Why can it be hard to use MD simulations to simulate enzymatic reactions?

A

Because enzymatic reactions include the breaking and forming of bonds which cannot be modeled by MD simulations because in the mathematical representations of the forces bonds cannot break.

52
Q

What are the steps of virtual screening?

A

Screening databases and chemical libraries for compounds.

Docking screening - which compounds fit into the binding pocket?

Docking algorithm:
- sampling of ligand conformations
- ligand scoring.

53
Q

When you want to do a virtual screen, what should you screen for possible compounds?

A

The drug like chemical space is too big.

High throughput screening libraries.
Commercial chemical space.

54
Q

Describe the steps of a molecular dynamics simulation.

A

Get the structure

Define the force field - using the forcefield parameters.

Minimize the system to find local/global minima

Equilibrium phase where we slowly increase the temperature to reach lab conditions

Production phase where we run the simulation and get the trajectories.

Compute the free energies.

55
Q

What parameters influence the forces on O2 gas?

A

Within the molecule only the vibrations of the bond modeled by Hooke’s law but between the separate O2 molecules we also have the non-bonded terms Van her waals and electrostatic interactions modeled by Lennard-jones potential and Colombs law.

56
Q

Why cannot bonds break when doing MD simulations?

A

In reality bonds would break if we pulled the atoms too far apart but in the mathematic representations of the forces the come from bonds the energy increases the more we pull the bond.

57
Q

Briefly explain the process of alphafold.

A

Alphafold uses machine learning to predict the protein structure.

You input an amino acid sequence and alphafold does multiple sequence alignments to find a sequence representation of the input.

It also looks for related structures in PDB to create a pair representation of the input.

The pair and sequence representation is put through a transformer called the evoformer that extracts information out of the representations and a structural hypothesis is formed.

The structural hypothesis is used to improve the MSA which then gives a new hypothesis. This is done in 48 blocks.

The refined hypothesis is then put through a second neural network that gives an initial prediction of the structure and applies physical and chemical constraints.

This prediction goes back though the evoformer and the second network 3 times before an output is given.