Determining amino acid sequence of a protein Flashcards
What are the two important techniques Fred Sanger used to determine the amino acid sequence?
- N-terminal tagging identifies the first amino acid in the chain
- Limited hydrolysis breaks the chain into smaller, more manageable pieces
How can we identify the first amino acid in a protein?
- Tagging it with fluorodinitrobenzene (bright yellow)
- The amino acid at the N terminus of a protein has a free ⍺-amino acid group
- At high pH, this group deprotonates to become :NH2-, a nucleophile
- Reacts with fluorodinitrobenzene; HF is a good leaving group
- Hydrolysis releases the N-terminal amino acid, with yellow tag attached; can identify it by chromatography
What is big snag?
hydrolysis destroys the rest of the polypeptide chain
Why was Sanger’s method not enough? Who improved it and how?
- Sanger’s method can only determine the N-terminal amino acid ONCE; we can’t repeat it, since hydrolysis destroys the rest of the polypeptide chain
- Perh Edman(Sweden1956) improved the Sanger method
- His method allows N-terminal amino acid to be reacted, removed, and identified without hydrolysing the peptide bonds
- Reaction can b repeated to identify amino acid #2 etc
- Up to 50 amino acids can be identified using automated sequencers based on the Edman degradation
What was Edman’s degradation
The peptide bond between amino acid # 1 & 2 is cleaved in 2 steps:
- Coupling requires base; reaction must be complete before the next cyclization step can take place (labels the N-terminal AA with PITC)
- Cyclization requires acid; reaction must be complete before the next coupling step can take place (cleaves the first peptide bond)
- Phenylisothiocyanate reacts with a deprotonated N-terminal amino group.
- Deprotonation exposes the lone pair of N, allowing it to react as a nucleophile, which can then attack an electron deficient nucleus, the C atom of isothiocyanate. This requires mildly basic conditions, pH 9, which is achieved by carrying out the reaction in a weak base such as pyridine.
- The coupled product is called a phenylthiocarbamoyl peptide.
- The phenylthiocarbamoyl peptide is transferred into weak anhydrous (no H2O) acid which causes the C=S to attack the nearest peptide bond, i.e. the one linking the N- terminal amino acid to the rest of the chain. The result is a cyclization reaction that splits off the first amino acid, leaving the rest of the chain intact.
What is the molecular biology approach to determining the amino acid sequence of a protein?
- Nowadays, it is common for the DNA sequence of a protein gene to be determined first, using cloning and molecular biology methods
- Can then determine the protein sequence, using the genetic code
DNA sequence → amino acid sequence
How does one study long protein chains?
- To study long protein chains, they are cut into shorter oligopeptides by selective hydrolysis (divide and conquer)
- Selective hydrolysis cuts the polypeptide at specific locations, to yield a limited number of oligopeptides of definite size
- The digestive enzyme trypsin binds and recognizes Arg or Lys side chains in peptides
- carboxylate group of the Arg or Lys is positioned next to catalytic unit of trypsin, and is target for hydrolysis of peptide bond
How do Trypsin and chymotrypsin convert polypeptides into smaller fragments?
- Trypsin is an enzyme that binds a polypeptide and cuts the peptide bond on the carboxylate side of the targets Arg or Lys.
- Note that all fragments will have Arg/Lys at the C-terminal end, EXCEPT the fragment from the C-terminus (so can easily identify it)
- Chymotrypsin cuts polypeptide on the carboxylate side of Phe, Tyr or Trp.
- All fragments (except C-terminus) have Phe/Tyr/Trp at the C-end
- In both cases, if the next amino acid after the target is proline, the polypeptide fails to bind to the enzyme and can’t be cut at that point. Proline has an unusual conformation due to the side chain bonding to the ⍺-amino N.
What is Cyanogen bromide and what does it do?
Cyanogen bromide is a chemical reagent which cuts polypeptide chains at methionine residues
- Cyanogen bromide Br-C(triple bond)N attacks S atom of Met
- Peptide chain is broken on carboxylate side and Met is converted to homoserine, Hse (serine with extra CH2)
- All fragments (except C-terminus) have Hse at C-end
There are many potential cut sites in myoglobin; what do these yield?
these yield oligopeptides of defined size
- Proteins digested with enzymes like trypsin yield characteristic patterns of fragments of different molar mass
- can be analyzed by mass spectrometry
- this is a definitive means of identifying a known protein
- In experiments to determine the amino acid sequence of an unknown protein, oligopeptide fragments from selective hydrolysis are first separated by chromatography, then each can be sequenced by Edman’s method
- The peptide sequences are reassembled into a complete sequence by the overlap method
The sequence of myoglobin showing sites where the polypeptide chain can be cut: red for sites where chymotrypsin attacks; blue where trypsin attacks (note the underlined Lys-Pro is not cut); and green where cyanogen bromide attacks Met.
If myoglobin is digested in chymotrypsin, all the red labelled sites will be hydrolysed at the peptide bonds immediately following the target amino acid, since it’s not possible to attack at only one location at a time. Similarly all the sites labelled in blue will be cut by trypsin.
*
What is the overlap method?
- Two samples of the original polypeptide are each cut separately using two hydrolysis methods, each targeting different sites (e.g. trypsin, chymotrypsin)
- Sequences from one set of oligopeptides are lined up to overlap with oligopeptides from the other set, to deduce how they were originally joined
How can one use mass spectrometry to sequence and identify proeteins?
- Tandem mass spectrometry
- Only tiny amounts of sample needed (spot from 2D gel)
1. Sample hydrolyzed by protease
2. First MS-1: separate peptides of different masses
3. Collision cell: fragment each peptide molecule once (usually the peptide bond) in a random fashion
4. Second MS-2: measure fragment masses
How are peptides generated?
- When peptides are generated with trypsin – each peptide has K or R at its C-terminus
- Peptides generated at low pH:
- acidic residues have no charge on side chain, e.g. COOH
- basic residues have +1 charge on side chain, e.g. NH3+
- Peptides with charges produce the highest signal, i.e. the ones with K or R at C-
- Example peptide that we will use to illustrate the process of sequencing with MS:
Ser-Glu-Thr-Val-Gly-Pro-Arg
0 0 0 0 0 0 +1
This peptide then undergoes fragmentation, breaking one peptide bond per molecule on average, in a statistically random fashion. The example peptide might be fragmented at one of six possible break sites. These fragments then go through the second MS, where peptides with charges produce the highest signal.
How does mass spectroscopy fragment the peptide?
Cleave one peptide bond per molecule, in random manner
The difference in mass between fragments is used to identify the amino acid, using the following list of amino acid masses. The only ambiguity is leucine and isoleucine, which have exactly the same mass.
The mass of the peaks represents the mass of one charged fragment type. The difference in mass between the peaks presents the mass of one amino acid as you go from one fragment to the next.
What is blast searching?
BLAST (Basic Local Alignment Search Tool)
Compares input sequence to databank of all known protein sequences
Produces a list of the best “hits”
- 100% identical: positive match
- Good identity: a homolog
- No identity: a new protein?
Perform biochemical tests to determine identity