Structure bioinformatics - Jens lectures Flashcards

Question

What is the difference between 1st, 2nd and 3rd generation methods for protein structure prediction?

Answer 1

1st: Based on residue correlations Chou-Fasman Q50%. 2nd: Context dependent GOR Q60-70%. 3rd: Evolutionary information, neural networks. PhD, PSIPRED method Q > 70%

Answer 2

Helices Helix bundles Beta barrels

Answer 3

Helices are 15-30 residues long Tightly packed helices with coiled-coil structure (multiple helices coiled together). Predominantly apolar/hydrophobic helix. Positive residues such as K and R are more common on the cytosolic side. Aromatic residues such as Y and W are more common at the ends of helices.

Answer 4

The sheets in the barrel have hydrophobic residues every second position. Beta-strands flanked by aromatic residues. They are more difficult to predict than transmembrane alpha-helices.

Answer 5

A method to predict transmembrane helices in a protein using a hidden markov model. HMMs are trained on known examples of transmembrane and non-transmembrane regions in protein sequences. The model learns the characteristics of these regions and uses this knowledge to predict the likelihood of transmembrane helices in a given protein sequence.

Answer 6

In the Gen Threader approach we look at local and global measures for scoring. Local measures: secondary structure preferences - specific aa are more probable in secondary structures. Global measures: If there are clashes in the structure. If there is a hydrophobic core. If there are reasonable bonds ect.

Answer 7

de novo fold recognition Homology modeling

Answer 8

Unlike homology modeling, threading is not based on sequence identity but is based on the fact that there is a limited number of folds that a protein can have. In threading you simply thread the target sequence onto different fold possibilities and look at compatibility scores to find the fold that best fits the sequence.

Answer 9

Identify structure templates (all proteins that have structures in PDB) and then do alignments to those templates to find the ones with the highest sequence identity. For the parts that have high sequence identity you transfer the coordinates of the backbone of the template to the target. Then evaluate the model.

Answer 10

We look at how protein-like is your structure? Stereochemistry: - bond length and angles - peptide bond planarity - torsion angles - clashes - ramachadran plots Spatial features: - hydrophobic core - solvent accessibility - distribution of charged groups If we get big insertions in secondary structures then something is off. There are servers that do these evaluations (PROCHECK, MolProbity).

Answer 11

Being in the twilight zone means that the sequence identity is lower than around 30% and in these situations you cannot do homology modeling and you should choose to do a fold recognition.

Answer 12

If the side chain is conserved and if atoms are overlapping, coordinates are copied. For non-overlapping atoms in the side-chains use a rotamer library and choose the rotamer that does not clash with a residue.

Answer 13

Every amino acid in the query sequence is in one of the following states outside, inside or in the membrane of the cell: - Core TM helix - Helical tails - loop on cytoplasmic side - short loop - long loops outside of the cell - globular-domain inside loop. Each state has a probability distribution for each amino acid and there are probabilities for moving between states and when moving through the query sequence the HMM will choose the state path that has the highest probability --> gives most probable topology.

Answer 14

Use multiple of the methods and take the consensus prediction which is the average of all the methods as result.

Answer 15

Methods like GOR looks at only the query sequence and makes a prediction based on the preference of being in different structures for all of the amino acids or windows of many amino acids. Some of the models include context but they do not include evolutionary information. Methods like Jpred also looks at homologous sequences and makes prediction about the target based on the structures of the homologous proteins. This is good because secondary and tertiary structures are conserved between homologous sequences. Also insertions and deletions can be found and these are less likely to be in secondary structures and can be predicted to be in coils. m

Answer 16

GPCRs are all transmembrane proteins and they consist of seven alpha-helices passing through the membrane with each helix parallell to the membrane.

Answer 17

The fact that structure is more conserved than sequence in evolution.

Answer 18

Fold recognition is used if you cannot find any templates with high sequence identity to you target and if you want to predict the overall structure for a protein. Meaning that you are not very interested in the atomic regions.

Answer 19

It assumes that a protein with a similar fold has been observed before. This is very likely.

Answer 20

Ab initio modeling Fold recognition alphafold

Answer 21

A technique to apply to your predicted structure to find the most energetically favorable configuration of the molecule. The aim is to refine and relax the structure and make it more realistically plausible. It is often performed using force fields and molecular dynamics simulations.

Answer 22

- construct a template library (CATH and SCOP can be used to select a representative set of folds from PDB).. - sequence to structure alignment (the threading). - design a scoring function - template selection and model construction.

Answer 23

You can thread your target to all fold templates and score all of them and find the optimal one - time consuming and challenging. You can score only a few of the threadings and then improve the alignments using sequence profiles for the templates - faster and works relatively well.

Answer 24

You should do separate searches for each domain.

Answer 25

A molecule that is expected not to bind, with similar properties [2] (logP, molecular weight, number of rotatable bonds, . . . ) of a ligand, but with a different topology (molecular connectivity)

Answer 26

Does the algorithm favor ligands or decoys? It is an analysis to test if the algorithm can tell the difference between actives and non-actives and in a way test the rate of false positives (chooses decoys). Ligand enrichment will be viewed as a curve far to the left of the random curve in a graph with decoys on x and ligands on y. Decoy enrichment will be viewed as a curve far to the right.

Answer 27

The sum of all of the energies (electrostatic interactions). Low energy favors docking.

Answer 28

Decoys are molecules that are very similar to the ligands but they are not active. We allow the algorithm to dock these and learn that they get low docking scores and should not be docked in the future.

Answer 29

That a ligand recognizes the active site of a protein is the key for protein function and drug design. - Negative free energy (deltaG = deltaH - TdeltaS < 0) - Shape complementarity

Answer 30

If the free energy from the interactions between ligand and protein is negative it means that it is energetically favorable for them to form a complex. This means that we can predict which ligands will interact with proteins and improve their binding by altering the chemical structures of the ligand.

Answer 31

Molecular docking is computational structure-based drug discovery. We want to rapidly predict protein-ligand complexes by looking at the protein structure. This is more time and cost efficient than high throughput screening for a ligand where the hit-rates are also very low with many false positives.

Answer 32

1. Sampling: Generation of orientations of molecule in the receptor. 2. Scoring: Determine which orientation has the lowest energy. Scoring is more difficult than sampling.

Answer 33

By making approximations like: - Freeze the protein to only look at the crystal structure of the binding pocket without any flexibility. - Do not include water molecules explicitly. Important water molecules can be treated as part of the protein. - Constrain search to only one binding site on the protein (If you know where you want your ligand to bind). - Only treat the ligand as flexible. Sample different orientations in the binding site.

Answer 34

- Force field based: Score is calculated from molecular mechanics force fields. - Empirical: Empirical scoring functions assign a score to a ligand pose based on empirical rules and parameters derived from experimental data. We can look at for example hydrogen bonds and hydrophobic interactions at see if they are similar to experimental binding affinities. - Knowledge-based: Make statistics for which interactions that are favorable and give score based on its probability.

Answer 35

The forcefield based function calculates the energy of the bond with coloumbs law. The empirical ones looks at the distance and angle of the bond and scores based on how the parameters look in experimental data. If the atoms are within 2.8Å and has a good angle then the score is positive. The knowledge-based method looks at the probability of this bond happening based on statistics from PDB. High probability gives high score.

Answer 36

The empirical scoring functions is a way to see how well an orientation of a ligand forms a complex with the protein in terms of binding affinity. The function looks at the energy change that happens when the ligand binds to the protein in the following parameters and scores based on how they look in experimental data: - solvation: Changes in interaction with salvation. - Conformation. - Interaction: energy that comes from interactions based on distance/angle ect. - Rotatable bonds: reduction of conformational freedom of the ligand’s rotatable bonds. - trans/rot: loss of translational and rotational freedom for ligand. - Vibrations: changes in bond and angle vibrations (often ignored)

Answer 37

Redocking is a way to see if a molecular docking algorithm works and finds what it should find. For this we collect experimentally determined complexes and take them apart to see if the algorithm will predict the complex. To measure success we look at the similarity between experimental ligand pose and the one that the algorithm decides on (RMSD). The algorithms can find ligands very similar (low RMSD) to the experimental but not the exact one.

Answer 38

A way to test if a molecular docking algorithm works by seeing if it can find ligands among decoys. Performance is based on target and software.

Answer 39

A way to test the molecular docking algorithm to see if it can predict the complex with highest binding affinity out of multiple closely related ligands with known affinity to the protein. This prediction is generally very weak.

Answer 40

If you do not have a ligand in mind then you can do a virtual screen for suitable structures from chemical libraries and do a docking screening to the protein and choose the ones that seem to fit. Then run these through a docking algorithm. This is much more time and cost efficient than high through put screening since it is computational.

Answer 41

We make approximations like: - We do not look at the explicit water molecules around the peptide. - We only treat the ligand as flexible, not the binding pocket of the protein. - We only look at one active site when sampling for compounds. We make these approximations to enable the speed of the algorithm since it is supposed to be able to screen through a large number of possible compounds.

Answer 42

Ligand enrichment - can the algorithm find active ligands out of closely related non-active decoys? Redocking - can the algorithm predict complexes similar to experimental complexes that we have taken apart? Relative affinity prediction - can the algorithm find the complex with the highest binding affinity out of multiple closely related ligands?

Answer 43

GOR and Chou-Fasman are purely sequence based while the phd method incorporates evolutionary information from multiple sequence alignments.

Answer 44

Top left - beta sheet lower left - right handed helix to the right - left handed helix If we only have two outliers then it could be the terminals of the peptide because they have a lot of variation and could for example be loops. If we have many outliers then that would instead indicate that the structure is not optimal.

Answer 45

X-ray crystallography: high resolution Proteins must form crystals We can't see density map for entire protein. NMR: good for small soluble proteins 3D-microscopy: Good for larger complexes

Answer 46

no because disordered peptides usually do not have crystal structures.

Structure bioinformatics - Jens lectures Flashcards

(70 cards)