Protein Structures Flashcards

1
Q

Levinthal’s paradox

A

Levinthal’s paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chameleon Sequences

A

One Sequence with More than One Fold
Some amino-acid sequences can assume different secondary
structures in different structural contexts

The concept that the secondary structure of a protein is essentially determined locally by the amino-acid sequence is at the heart of most methods of secondary structure prediction; it also underlies some of the computational approaches to predicting tertiary structure directly from sequence. Although this concept appears to be valid for many sequences, as the database of protein structures has grown, a number of exceptions have been found. Some stretches of sequence up to seven residues in length have been identified that adopt an alpha-helical conformation in the context of one protein fold but form a beta strand when embedded in the sequence of a protein with a different overall fold. These sequences have been dubbed chameleon sequences for their tendency to change their appearance with their surroundings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

conformation switches

A

Protein conformational switches alter their shape upon receiving an input signal, such as ligand binding, chemical modification, or change in environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Secondary Structure Elements

A

alpha-Helix, amphiphatic alpha Helix, beta sheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PDB - Protein Data Bank

A

http://www.rcsb.org or http://www.pdb.org
• central repository for biomolecular structures:
- experimental: NMR, X-ray, neutron, EM
- theoretical (separate site)
- structural “version tracking”
• fixed format(s) for representation of structural data:
- PDB format
- mmCIF format
• search engine
• some analyses

  • Sequence
  • Coordinates
  • Links to relevant DBs
  • Citing paper
  • Taxonomy
  • Chemical information
  • Experimental conditions and artifacts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Uni Prot

A

3 Layers of UniProt:
• the UniProt Archive (UniParc):
- UniProtKB + all other protein sequences publicly available
- completeness

• the UniProt Reference Clusters (UniRef):
- non-redundant views of UniProtKB + selected
UniParc sets
- speed

• the UniProt Knowledgebase (UniProtKB)
- central database of annotated protein sequences
and functional information
- UniProtKB/Swiss-Prot + UniProtKB/TrEMBL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Swiss- Prot and Trembl

A

Swiss-Prot / TrEMBL - 2017-06-14
• Swiss-Prot (554,860)
- Manually annotated and reviewed.
- Records with information extracted from
literature and curator-evaluated computational
analysis.
• TrEMBL (87,291,332)
- Automatically annotated and not reviewed.
- Records that await full manual annotation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Predicting Secondary Structures

A

Chou-Fasman method: The Chou-Fasman method is an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman. The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures solved with X-ray crystallography. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50-60% accurate in identifying correct secondary structures, which is signicantly less accurate than the modern machine learning-based techniques. (Wikipedia)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Structure Similarity

A

• when are two structures similar?
• given two protein structures, what is their largest common
substructure? The structures of bacteriochlorophyll-A (4bcl) and the transmembrane part of porin (2omf) would be appropriate for
this question.
• which atoms in a protein structure A correspond to which atoms in protein structure B? The myoglobin and leghemoglobin structures would be appropriate structures for this question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Structural Alignment

A

Structural Alignment: Structural alignment attempt to establish homology between two or more polymer structures based on their shape and three- dimensional conformation. this process is usually applied to protein tertiary structures but can also be used for large RNA
molecules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3D Matching

A

• collection of (possibly typed) atoms or groups of atoms (“points”) in some given relative 3D placement.
• the placement of a group of atoms is defined by the position of a reference point (e.g., the center of an atom) and the orientation of a reference direction.
• the type can be the atom ID, the amino-acid ID, etc…
Two structures A and B match if:
1. Correspondence:
There is a one-to-one map between their points.
2. Alignment:
There exists a rigid-body transform T such that the RMSD between the points in A and those in T(B) is less than some threshold ε

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Root Mean Square Deviation (RMSD)

A

RMSD: In bioinformatics, the root-mean-square deviation of atomic positions (or simply root-mean-square deviation, RMSD) is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins. Note that RMSD
calculation can be applied to other, non-protein molecules, such as small organic molecules. In the study of globular protein
conformations, one customarily measures the similarity in 3D structure by the RMSD of the C atomic coordinates after optimal rigid body superposition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Alignement operations

A

Translation and rotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Double Dynamic Programming

A

SSAP - Sequential Structure Alignment Program used for CATH database
• lower level
- fix coordinate frame on the backbone of one residue
- align residue environments
• upper level
- cumulates scores of similarities in residue environments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CE algorithm

A

Protein structure alignment by incremental Combinatorial Extension (CE) of the optimal path.
Define Alignment Fragment Pair (AFP) as a continuous segment of protein A (submatrix) aligned against a continuous segment of protein B (submatrix) - without gaps. An alignment is a path of AFPs s.t. for every two consecutive AFPs there may be gaps inserted into either A or B, but not into both. That is, for every two consecutive AFPs i and i+1 of length m

p(i+1)A=piA+m and p(i+1)B=piB+m

or
p(i+1)A=piA+m and p(i+1)B>piB+m

or
p(i+1)A>piA+m and p(i+1)B=piB+m
where piA is the starting position of AFP i in protein A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CE algorithm step by step

A
  • goal: Find a “good” local alignment for structures of proteins A and B
  • basic idea:
  1. select some initial AFP
  2. build an alignment path by incrementally adding AFPs in a way that satisfies the conditions on the previous slide
  3. repeat step (2) until the length of each protein is
    traversed, or until no “good” AFPs remain
17
Q

CE problems

A

• how do we choose the starting AFP?
• what are the criteria for adding AFPs to our alignment path
• how do we know when to stop? That is, at what point do we know that there no “good” AFPs left
There are various heuristics that could be used to supply answers to the above questions.
To assess how good the alignment produced by CE is - i.e. its significance, we can compare it to the alignments of random
pairs of structures and compute the Z-score of the corresponding RMSD values.

18
Q

Contact maps

A

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues i and j, the ij element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å (Cα is used for Glycine); and distance between the side-chain centers of mass.

19
Q

DALI algorithm

A
  • DALI: distance alignment method
  • splits proteins into hexapeptides
  • pairwise comparisons of all RMSD values between all fragments of both structures
  • accounts for the possibility of order changes of structures
  • database FSSP (Families of Structurally Similar Proteins)
  • link http://ekhidna.biocenter.helsinki.fi/dali/
  • database, standalone program, webservice