Structural bioinformatics Flashcards by Ola Żyto

Why do we need protein-protein docking?

PDB contains ~4,000 hetero-dimers (including duplicate copies for same complex)
 Compare to number of entries ~170,000
 Can one predict the structure of a complex starting with
unbound components?
 Unbound components can be experimental or high quality predicted structures
 Need to be able to model limited conformational change.

How well did you know this?

Not at all

Perfectly

Describe typical protein protein docking

How well did you know this?

Not at all

Perfectly

Walk through protein docking step by step

Global search to find goo,d overlap of surfaces of protein
Residue residue interactions - score with empirical residue park potential. You need to check if a given pair is present in the protein more frequently than just by chance
Search for clusters of similar complex geometry with low energy. Many more ways to be reign than correct so correct solution will be found much more often than any individual wrong solution.
Refinement - search for optimal combination of side-chain rotamers by energy calculation.
Functional residue information - the function can give you info about the structure

How well did you know this?

Not at all

Perfectly

Template based prediction from homologous structure of a complex

X-ray structure of protein A’ complexed with protein B’
A’ is homologue of A B’ is homologue of B
If A/B interface is favourable evaluated in 3D then predict A interacts with B

How well did you know this?

Not at all

Perfectly

Template based modelling - sequence search

Start with sequence protein A and protein B
 Based on sequence similarity, search library of complexes in PDB for a complex A’ / B’ where A is homologous to A and B is homologous to B’
 Align sequence A to A’ and B to B’
 Sequence search via BLAST, PSIBLAST or an advanced statistical model known as Hidden Markov Model HMM

How well did you know this?

Not at all

Perfectly

Template based modelling - 2 model construction

On 3D structure of complex change sequence from A’ to A and B’ to B
 Adjust any loops where there is an insertion or deletion  Refine complex

How well did you know this?

Not at all

Perfectly

Steps template-based modelling 3 – alternate model selection

Sometimes there can be several suitable templates as several have similar sequence identity
 Construct several models
 Score models (similar to ab initio docking)
 Choose best model
 NB this is one approach but several variations in template-based modelling

How well did you know this?

Not at all

Perfectly

Coevolution and protein interactions

 Concept of correlated mutations extend to homo and hetero complexes
 AlphaFold multimer and Colab can consider complexes  Active area of research – results very encouraging

How well did you know this?

Not at all

Perfectly

What is the gene ontology?

A controlled vocabulary that can be applied to all organisms
 Used to describe gene products - proteins and RNA - in any organism
 All descriptions are supported by some level of evidence

How well did you know this?

Not at all

Perfectly

How does GO work?

It captures information about 3 important features of function:
 What does the gene product do?
 Why does it perform these activities?  Where does it act?

How well did you know this?

Not at all

Perfectly

Describe the 3 gene ontologies

 Molecular Function = biochemical function
 the tasks performed by individual gene products; examples
are carbohydrate binding and GTPase activity
 Biological Process = biological goal or objective (higher level function)
 broad biological goals, such as mitosis or purine metabolism, that are accomplished by combinations of individual molecular functions.
 Cellular Component = active location
 subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

How well did you know this?

Not at all

Perfectly

Can you compare gene ontologies?

No, each branch is independent so you can’t really compare

How well did you know this?

Not at all

Perfectly

Where do annotations come from?

Annotation source is important
 It enables you to assess how confident the annotation is
 GO associates annotations with an evidence code that indicates its source.

How well did you know this?

Not at all

Perfectly

Uses of GO

 Enhanced predictors of protein function return prediction of GO terms
 Common features in a set of over-expressed / under- expressed genes can be reported as belonging to a common GO group

How well did you know this?

Not at all

Perfectly

Why do you want to computionally predict protein function

About 240M sequences but time taken experimentally to determine function can be several years

How well did you know this?

Not at all

Perfectly

Approches to function prediction

Study These Flashcards

General homology search
 BLAST (single sequence information)
 PSIBLAST (multiple sequence information in profile)
 HMMs scan (multiple sequence information in hidden Markov model)

Accuracy of database searching

Study These Flashcards

Orthologues and paralouges and their EC numbers

Study These Flashcards

Orthologues: same EC and high %id
Paralogues: Different EC and lower % id

Relationship %id and functional transfer

Study These Flashcards

To some extent these concepts about EC numbers apply to all types of function but no clear cut rules
 Function challenging to quantify
 Start with a query sequence
 Perform BLASTP search

What can you predict based on the %id between two proteins?

Study These Flashcards

 If find a protein from another species and >~50% identity could easily be an orthologue with same function
 If find a protein from another species and >~30% identity could be a paralogue and have related function
 If find a protein from same species and >30%id could be a paralogue and have related functions

Further complications - Domains

Study These Flashcards

Confident match by a search program
Search programs identify local similarities. Two sequences may share a region/domain of similarity but also other domains that are different.
But not all domains are shared – function probably different

NetGo – Using different sources (Details not important)

Study These Flashcards

Structural searching

Study These Flashcards

 Use structural searching to identify similar fold (e.g. DALI ot in CATH))
 Similar 3D structure may suggest related function
 If functional site in match is known examine if similar residues in your newly-determined protein

Can finding a similar fold tell you anything about function?

Study These Flashcards

No, to find about function you need to dig a little deeper

Convergence of serine protease active site

Trypsin and subtilisin have quite different 3D folds  Trypsin and subtilisin have totally different sequences  Same three active site residues (Ser, His, Asp) located in very similar positions  Order along chain of Ser, His, Asp different  Infer convergence of function  Favourable arrangement of residues to perform a function that arose vias evolution independently

Principles for non polar proteins

 Apart from the ends, the protein chain in the membrane is in a non-polar (i.e. hydrophobic) environment.  Side-chains tend to be hydrophobic  Main-chain cannot have NH and CO groups not forming hydrogen bonds  Single spanning membrane regions one α-helix  Most (but not all) membrane bound proteins formed from α-helical segments

Transmembrane helices

 Hydrophobic section of membrane about 35 – 50A wide  Section not translocated (i.e. pass through membrane) that abuts membrane often contains + ve charges  Each residue in an a-helix advances structure by 1.8A  Transmembrane helices tend to be between 20-30 residues long

Simple methods to identify transmembrane regions

 Early methods searched for runs of very non-polar (hydrophobic) residues along squence  Use a scale of how hydrophobic each residue is  Hopp & Woods scale  Kyte & Doolitle scale  Calculate average over a window of several residues (typically 11)

Signal peptides

Signal peptides refer to the sequence at start of proteins ranging from 15 – 60 residues that directs the protein to the correct cellular location.  Generally signal peptides cleaved off  Often has a hydrophobic region followed by a pattern typical of the cleavage site  In predictions need to distinguish transmembrane regions from signal peptides

Current methods for transmembrane predictions

 Until recently prediction methods developed HMMs (hidden Markov models) based on aligned sequences  DeepTMHHM – The algorithm predicts how the sequence maps onto the different potential from the N- to the C- terminus a deep learning approach that predicts transmembrane structures and signal peptides,

Low complexity regions

 Regions with composition biased strongly to a small number of amino acids  e.g. RRRRRRR, RSSRSRSSRSR, GGSSSSDDS  Occur in the sequences of a significant number of proteins  Distort statistical significance scores of alignments  The program SEG (by Wootton) is often used  Replaces low complexity regions with lower case in BLAST searches at NCBI

Coiled-coils

Two or three intertwined α-helices  Can be short segment (20 residues) or far longer  Identified using COILS written by Lupas

Disordered proteins

 Some globular proteins have small regions which are disordered and cannot be identified by crystallography or NMR – often N- or C- terminus or a long loop.  Some proteins do not adopt a single structure but are highly flexible  Often protein becomes structured when protein binds to another protein

Prediction of disordered regions

 Based on principles– tend to have lower fraction of hydrophobic residues than folded protein with a hydrophobic core.  Machine learning based- Most programs now based on neural networks, support vector machines  PONDR-FIT (Dunker)  DISOPRED2 (Jones, UCL)  IUPRED  (see Expasy page – list of programs)  Use AlphaFold prediction where confidence given by pLDDT<70% (see Piovesan, et al Protein Science, 31(11), e4466.)

Structural bioinformatics Flashcards

(34 cards)