Protein Structures Flashcards
Levinthal’s paradox
Levinthal’s paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations.
Chameleon Sequences
One Sequence with More than One Fold
Some amino-acid sequences can assume different secondary
structures in different structural contexts
The concept that the secondary structure of a protein is essentially determined locally by the amino-acid sequence is at the heart of most methods of secondary structure prediction; it also underlies some of the computational approaches to predicting tertiary structure directly from sequence. Although this concept appears to be valid for many sequences, as the database of protein structures has grown, a number of exceptions have been found. Some stretches of sequence up to seven residues in length have been identified that adopt an alpha-helical conformation in the context of one protein fold but form a beta strand when embedded in the sequence of a protein with a different overall fold. These sequences have been dubbed chameleon sequences for their tendency to change their appearance with their surroundings
conformation switches
Protein conformational switches alter their shape upon receiving an input signal, such as ligand binding, chemical modification, or change in environment.
Secondary Structure Elements
alpha-Helix, amphiphatic alpha Helix, beta sheet
PDB - Protein Data Bank
http://www.rcsb.org or http://www.pdb.org
• central repository for biomolecular structures:
- experimental: NMR, X-ray, neutron, EM
- theoretical (separate site)
- structural “version tracking”
• fixed format(s) for representation of structural data:
- PDB format
- mmCIF format
• search engine
• some analyses
- Sequence
- Coordinates
- Links to relevant DBs
- Citing paper
- Taxonomy
- Chemical information
- Experimental conditions and artifacts
Uni Prot
3 Layers of UniProt:
• the UniProt Archive (UniParc):
- UniProtKB + all other protein sequences publicly available
- completeness
• the UniProt Reference Clusters (UniRef):
- non-redundant views of UniProtKB + selected
UniParc sets
- speed
• the UniProt Knowledgebase (UniProtKB)
- central database of annotated protein sequences
and functional information
- UniProtKB/Swiss-Prot + UniProtKB/TrEMBL
Swiss- Prot and Trembl
Swiss-Prot / TrEMBL - 2017-06-14
• Swiss-Prot (554,860)
- Manually annotated and reviewed.
- Records with information extracted from
literature and curator-evaluated computational
analysis.
• TrEMBL (87,291,332)
- Automatically annotated and not reviewed.
- Records that await full manual annotation.
Predicting Secondary Structures
Chou-Fasman method: The Chou-Fasman method is an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman. The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures solved with X-ray crystallography. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50-60% accurate in identifying correct secondary structures, which is signicantly less accurate than the modern machine learning-based techniques. (Wikipedia)
Structure Similarity
• when are two structures similar?
• given two protein structures, what is their largest common
substructure? The structures of bacteriochlorophyll-A (4bcl) and the transmembrane part of porin (2omf) would be appropriate for
this question.
• which atoms in a protein structure A correspond to which atoms in protein structure B? The myoglobin and leghemoglobin structures would be appropriate structures for this question.
Structural Alignment
Structural Alignment: Structural alignment attempt to establish homology between two or more polymer structures based on their shape and three- dimensional conformation. this process is usually applied to protein tertiary structures but can also be used for large RNA
molecules.
3D Matching
• collection of (possibly typed) atoms or groups of atoms (“points”) in some given relative 3D placement.
• the placement of a group of atoms is defined by the position of a reference point (e.g., the center of an atom) and the orientation of a reference direction.
• the type can be the atom ID, the amino-acid ID, etc…
Two structures A and B match if:
1. Correspondence:
There is a one-to-one map between their points.
2. Alignment:
There exists a rigid-body transform T such that the RMSD between the points in A and those in T(B) is less than some threshold ε
Root Mean Square Deviation (RMSD)
RMSD: In bioinformatics, the root-mean-square deviation of atomic positions (or simply root-mean-square deviation, RMSD) is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins. Note that RMSD
calculation can be applied to other, non-protein molecules, such as small organic molecules. In the study of globular protein
conformations, one customarily measures the similarity in 3D structure by the RMSD of the C atomic coordinates after optimal rigid body superposition
Alignement operations
Translation and rotation
Double Dynamic Programming
SSAP - Sequential Structure Alignment Program used for CATH database
• lower level
- fix coordinate frame on the backbone of one residue
- align residue environments
• upper level
- cumulates scores of similarities in residue environments
CE algorithm
Protein structure alignment by incremental Combinatorial Extension (CE) of the optimal path.
Define Alignment Fragment Pair (AFP) as a continuous segment of protein A (submatrix) aligned against a continuous segment of protein B (submatrix) - without gaps. An alignment is a path of AFPs s.t. for every two consecutive AFPs there may be gaps inserted into either A or B, but not into both. That is, for every two consecutive AFPs i and i+1 of length m
p(i+1)A=piA+m and p(i+1)B=piB+m
or p(i+1)A=piA+m and p(i+1)B>piB+m
or
p(i+1)A>piA+m and p(i+1)B=piB+m
where piA is the starting position of AFP i in protein A.