Lecture 4: Protein Sequence and Structure Determination Flashcards
Uses of determining amino acid sequence in a protein?
- compare with all other known sequences (including DNA) to determine whether similarities exist
- sequence comparison of the same protein in different species can yield cluse about evolutionary pathways
- sequence comparison of the same protein in the same species (ie humans) can reveal the molecular mechanisms of (genetic) disease
* conserved amino acids (between very different species) suggest that a pathway is very significant
HCl - Determination of sequencing amino acid composition
- hydrolyze polypeptide in HCl at 110 c for 24 hours
- amino acids can then be separated by ion exchange chromatography
- ninhydrin can be used to ID amino acids
- modifies asparagines and glutamines to aspartic acid and glutamic acid
What does ninhydrin do?
- chemically modifies asparagines and glutamines to aspartic acid and glutamic acid

Elution profile of amino acids
- Peaks are in a ratio
- double height means double of that amino acid compared to another
- if asparagine is present is degrades it into aspartic acid and NH3 (see those peaks)
Purpose of cleavage and reduction of polypeptides
- accuracy of amino acid sequencing generally declines as the length of the polypeptide increases
- must be enzymaticaly (proteases) or chemically fragmented to be sequenced efficiently
- if disulfide bridges are present, they must be broken (reduced) and the resulting cysteine-sulfhydryl groups prevented from reformation of disulfide bonds (modified) [add reductant and then modify sulfhydryl groups]
What is Edman degradation?
- sequentially removing one residue at a time from the amino end of a peptide (fragment)
Steps of Edman degradation
- phenyl isothiocyanate reacts with the uncharged terminal amino group of the peptide to form a phenylthiocarbamoyl derivative
- the cyclic form of the derivative is liberated and can be identified by chromatographic methods
- in multiple labeling-release rounds, the amino acid sequence of a peptide can be determined
* keeps attackign N-terminus over and over to remove successive amino acids

Overlapping and Edman degradation
- divide the same polypeptide chain by different segments
- arrange the segments so the two kinds overlap
- this way you can tel how to connect the segments
What is mass spectrometry?
- analytical technique that measures the mass to charge ratio (m/z) of charged particles in a gas phase
Steps of mass spectrometry
- molecules to be analyzed (anylate) are first ionized in a vacuum
- when the newly charged molecuels are introduced into an electric and or magnetic field, their paths through the firls are a function of theis m/z ration (mass/charge)
- measure property of the ionized species can be used to deduce the mass (M) of the anylate with high precision
Three essential components of mass spectrometer
- ion source
- mass analyzer
- detector
Conversion of macromolecules into gas phase ions required?
conversion of macromolecular anylates such as proteins and nucleic acids into gas-phase ions (ionization) could not be achieved efficiently until the development of
–> electrospray ionization mass spectrometry (ESI MS)
–> matrix assisted laser desorption/ionization mass spectrometry (MALDI MS)
Mass to charge ration of ions and molecular mass of proteins
- gas phase macromolecules acquire a variable number of protons and thus positive charges, from the solvent, which creates a spectrum of species with different mass to charge ratios
- each successive peak corresponds to a species that differs from that of its neighboring peak by a charge difference of 1 and a mass difference of 1.
- the molecular mass M can be determined from any two neighboring peaks
Mass Spectrometry Equations
compare two peaks
p1 = (M + z1) / Z1
p2 = (M + z1 - 1) / (z1 - 1)
- just subtract everything by 1
- adjacent peaks have difference on 1 in protons and in mass
can solve for M and z1
- M = z1 (p1 - 1)
- z1 = (p2 - 1) /(p2 - p1)
ESI MS process
- analyte solution is passed through an electricaly charged nozzle into a chamber of low pressure, evaporating the solvent and ultimately yielding the ionized analyte
How fast do different sized particles move in ESI MS
more charge/more mass –> slower
less charge/less mass –> faster, detected first
MALDI MS process
- analyte solution is evaporated to dryness in the presence of a volatile, aromatic compound (the matrix) that can absorb light at specific wavelengths
- laser pulse tuned to one of these wavelengths excites and vaporates the matrix, converting some of the analyte into gas phase
- subsequent gaseous collisions enable the intermolecular transfer of charge, ionizing the analyte
* matrix is important to keep proteins from getting burned, makes sure they end up in gas form
MALDI TOF analyzer
- protein sample is ionized
- electrical field accelerates ions
- lightest ions arrive at the detector first
- laser triggers a clock
TOF and mass to charge ratio
mi/zi = 2eEl/(ti/ld)2
* mass to charge ratio is dependent on time
plug in time and get out the mass/charge ratio
–> first particles to arrive have the largest mass/charge ratio
Tandem mass spectrometry
- alternative to edman degradation as a means of sequencing proteins
- ionized proteins are analyzed by a first mass spectrometer and then broken down into smaller peptide chains
Steps of tandem MS
ion source –> MS1 –> collision cell –> MS 2 –> detector
- catch protein as it arrives after a certain time
- analyze it further
* pretty difficult to do
From protein mixtures to peptide mass identification
protein mixture (separation/2D electrophoresis) –> individual proteins (fragmentation/spot excision, site specific cleavage) –>
peptide fragments (fingerprinting/ mass spec) –>
peptide mass spectrum (database search/MSfit, MASCOT, Proetinlynx) –>
ID
Unique peptide sequences
- most of at least 6 amino acids are unique in the proteome of an organism and map to single gene products
Effect of sig figs in m/z searches
no decimals could lead to MANY hits (478)
going up to 4 decimals could lead to only 2 hits
Determination of fidelity of “known” sequences
- recombinant proteins
- synthetic peptides
detection of natural or biosynthetic mutations
isoforms
in vitro mutated proteins - random or site specific
identification of endogenous postranslational modifications
phosphorylation - regulatory or catalytic
disulfide bonds
identification of experimental chemical modifications
affinity or group secific labels
X ray diffraction general steps
crystal –> diffraction pattern –> electron density map –> atomic model
(refinement)
Componnts in an X ray crystallographic analysis
- an xray source generates a beam, which is diffracted (scattered) by a crystal
- the resulting diffraction pattern is collected on a detector
- want a lot of crystals for it ro work
- to get crystals get close to the ppt point and crystals will sometimes form
Physical principles of x ray crystallography
- electrons scatter x rays –> the amplitude of the wave sattered by an atom is proportional to its number of electrons
- the scattered waves recombine
- the way in which the scattered waves recombine depends only on the atomic arrangement
Resolution and electron density
- the better the crystal the better the resolution of th eimage
NMR
- carried out on macromolecule sin solution (xray crystalloggraphy is limited to molecules that can be crystallized)
- illuminate the dynamic side of protein structure, including conformational changes, protein folding and interactions with other moelcules
- depends on the fact that certain atomic nuclei are intrinsically magnetic due to a nuclear spin angular momentum, which can take either of two orientations or spin states calle a and b when a magnetic field is applied
spin states
- the enrgies of the two orientations of a spin 1/2 (such as 1H) depend on the strength of the applied magnetic field
- absorption of electromagnetic radiation of appropriate frequency induces a transition from the lower to the upper level (and resonance will be obtained)
- change of energy of one H will be transferred only to others in close proximity
* see chemical shifts of Hs on the neighboring C
2D NMR - overhauser effect
- identifies pairs of rpoteins that ar ein close proximity
- whatever is on straight line has no interactions
- dots off the line interact
- when proteins fold Hs that are far away int he chain still come into contact and interact
homologs
- biochemical entities related by common ancestry
- detectable by significant similarity in nucleotide or amino acid sequence and is (alomst always) manifested in three-dimensional structure
paralogs
homologs that are present within one species
- usually a rsult of gene duplication
- often differ in their detailed biological function
- different versions of the same protein for different tissues
* evolution –> develop from existing gene “tamplate” and are altered from there –> every gene begins from another
orthologs
- homologs that are present within different sepcies and have very similar or identical functions
statistical analysis of sequence alignments
- significant sequence similarity between two DNA, RNA or protein molecules implies that they are homologs and hence have same evolutionary origin
- as proteins are composed of a larger number of building blocks (20 aas) than DNA or RNA (4 nucleotides) random sequence agreements are less likely
alignment algorithms to compare sequences
- identities
- gap insertion
- conservative sumstitution
conservation of 3d structures
3d structures of proteins or RNA relate directly to their functions and hence are more evolutionarily conserved than primary structures or amino acid sequences
- similarities in structures:
–> can be detected without significant similarities in sequence aliggnments
–> usually indicate common functional mechanisms
similar structures but different functions
example:
- in primates a-lactalbumin expression is upregulated in response to the hormone prolactin and increases the production of lactose
- lysozymes are generally enzymes that damage bacterial cell walls by hydrolyzing their petidoglycan component
Convergent evolution
some proteins are structureally and funcitonally similar in many important ways but do not have a common ancestor
–> very different evolutionary pathways can elad to the same biochemical solution
chymotrypsin vs subtilisin