Protein structure prediction Flashcards

1
Q

What is the protein folding problem?

A

over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the protein folding problem?

A

over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why predict protein structure?

A

-the sequence- structure gap
structure can inform about function
- to guide rational drug design
- to guide mutagenesis studies
- to help solve structures from experimental data
- focuses om fundamental understanding of thee chemistry of protein structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you calculate. the similarity of two proteins?

A

-superpose the structure (often just the main chain) and quantify on. average the separation between equivalent positions
- quantified as root mean square devaition of equivalence positions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you quantify the accuracy of predicted models?

A
  • superpose predicted and x-ray structure
  • RMSD used for close structures
  • typically 70 out of 90 superposed residues have an RMSD of 2.6A
  • arbitrary decision such as choice of maximum difference between equivalent residues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the TM score?

A
  • Template modelling removes arobotrary choices
  • score between 0 and 1 includes all equivalences and is scaled for number of residues in the protein
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What TM do you need to say that the fold of your protein is good ?

A

Tom > 0.5 means overall food of protein is good
>0.75 means a good predicted structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we know that predictions work?

A

-evaluate on known structures
-if you know the answer you have an advantage even if you predict that you don’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is CASP?

A

-critical assessment of protein structure prediction
-blind trial required to evaluate the different approaches
-sequences sent to predictors prior to experimental coordinates revealed
-every two years with manual evaluation of results
- Manual interventions and server- only predictions - let’s the community know what servers are good

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are ab initio energy calculations?

A
  • original idea describe interactions between atoms and search for conformation of lowest energy
    -methods are evenrgy minimalists ion and molecular dynamics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the potential energy of a protein in a particular conformation

A

Bond length + bond angle + bond dihedral rotation + van Dee walls contacts + electrostatic interactins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Secondary structure predictions

A
  • aim to identify local secondary structures
  • theory is that to a large extent local sequence determines local structure
  • current ,ethos use multiply aligned sequences to provide extra information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Secondary structure predictions

A
  • aim to identify local secondary structures
  • theory is that to a large extent local sequence determines local structure
  • current ,ethos use multiply aligned sequences to - provide extra information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Wiat Information abolutnie the strukturę dań you gest from the sequence?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the current state of secondary prediction?

A
  • nearly every helix identified
  • most beta strands but short edge strands still poorly predicted
  • errors tend to be defined the precise ends
  • programs such as PsiPred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are three major approaches to protein prediction?

A
  • template based : reliable; protein fold space is limited< 50% of typical proteome covered
  • template free - sometimes reliable : deep learning with multiply aligned sequences can sometimes but not always give you good results
  • hybrids - deep learning with templates produces excellent models like alpha fold
17
Q

How does template based modelling work?

A
  • magenta protein structure unknown
    -cyan protein structure known
    -via sequence search find magenta sequence is similar to cyan sequence
    -predict structure of magenta protein from structure of cyan protein
18
Q

Describe how Phyre 2 works

A
19
Q

How do you do loop modelling?

A

-fragment the pdb
-find sequences similar to insertion and deletion
-check end point distances
-check backbone geometry
-fit fragment to core structure

20
Q

Loop modelling accuracy

A

Insertion and deletions relative to template modelled by loop library up to 15aas in lneght
-short loops under 5 good. Longer loop less trustworthy
-be wary of basing any interaction of the structural effect of point mutations

21
Q

Side chain modelling

A

-fit most probable rotated at each position
- according to given backbone angles
- whilst avoiding clashes

22
Q

Side chain modelling - accuaracy

A

Sidechainswillbemodelledwith~80%accuracy(chi 1) IF……the backbone is correct.
* Clasheswillsometimesoccurandiffrequent, indicate probably a wrong alignment or poor template
* AnalysewithPhyreInvestigator

23
Q

Interpreting results - sequence identity and model accuracy

A

Highconfidence(>90%)andhighseq.id.(>35%): almost always very accurate: TM score>0.7, RMSD 1- 3Å.
* Highconfidence(>90%)andlowseq.id.(<30%) almost certainly the correct fold, accurate in the core (2-4Å) but may show substantial deviations in loops and non-core regions

24
Q

What is the structural coverage of human proteome

A

53% — 36% Phyre and 17% pdb

25
Q

What is another template based modelling program?

A

Sissmodel

26
Q

Describe the template free approach

A
  1. You take fragments
  2. Predict the possible structures for the given fragment
  3. Trial structure for local sequence taken from database of segments of known 3D structure
  4. You put fragments together and check if if makes sense
  5. You can make changes to check if the change solution gives you a better structure and then you can either discard it or keep it
27
Q

Are fragment based methods reliable?

A

-Fragment-based methods could sometimes give reasonable predictions but sometimes fail
* Can be integrated with template methods to fil gaps or uncertain regions
* I-TASSER (Zhang) and Robetta (Baker) widely used
* Now superseded by deep learning e.g. AlphaFold

28
Q

How do you use sequence correlation in multiple sequence alignment to predict contacts?

A

-residues that interact with each other tend to evolve together as well - coevolution
- so coevolution gives you some info about the structure

29
Q

Alpha fold approach 1

A

The input is a multiple sequence alignment (MSA) of the query sequence
 In additions, known PDB structures provide structural data known as “templates”
 Two track learning called evoformer and structure
 First stage called evoformer features including residue-residue contacts at different distances (distograms)

30
Q

Alpha fold approach 2

A

The second stage of learning is the “structure” network
 Each residue is an independent unit (termed “gas”) and they are
not linked together.
 Position of the main-chain residues then predicted
 Then the side-chains fitted
 The learning is termed “end-to-end” so the function optimised (“loss function”) is the difference between the final model and the true structure and al steps learnt together
 The algorithms also predicts the expected accuracy of each part of the model (see later slides)

31
Q

AlphaFold approach - 3

A

Finally the structure is refined using molecular dynamics using Amber – but this did not improve the model in terms of RMSD but did correct some local stereochemistry.
 AlphaFold does not distinguish between template-based and ab- initio approaches.
 AlphaFold does use the information from homologous structures but this is within the deep learning
 AlphaFold does not use the Phyre/SwissModel approach of starting with a known template and using that as the starting point

32
Q

AlphaFold database: pLDDT accuracy metric

A

Per residue confidence metric pLDDT (colour coded on EBI models) on scale of 0 – 100
* pLDDT stands for predicted Local Distance Difference Test
* LDDT measures local agreement between two protein structures
* pLDDT > 90 are expected to be modelled to high accuracy.
* pLDDT between 70 and 90 are expected to be modelled well (a generally good backbone prediction).
* pLDDT between 50 and 70 are low confidence and should be treated with caution.
* pLDDT < 50 often shown as having a ribbon-like appearance and should not be interpreted – often disordered regions

33
Q

AlphaFold database: PAE accuracy metric

A

PAE – Predicted Alignment Error
* How well predicted is the distance between two
residues
* Assess confidence of
domain packing * Colour coded
Regions with very low PAE can be totally misplaced relative one another
Below the extracellular and intracellular regions pack which is biologically impossible

34
Q

What’s the but with alpha fold

A

Models for >200M proteins – Amazing resource!
 Models for 98.5% of human proteins
 But only ~58% of residues in human proteome predicted with high confidence
 Compare PDB + Phyre which is ~ 53% of residues

35
Q

What can you use for tertiary predictions?

A

CASP15