Week 5.10: Transcriptomics and Proteomics Flashcards
5.10 Transcriptomics and Proteomics
Proteomics methods
Two types
**1. **2D gel electrophoresis
**2. **LC-MS/MS
2D Gel electrophoresis
In 2D gel electrophoresis, proteins from a complex mixture are separated according to their isoelectric point in a gel tube and then separated according to their isoelectric point in a gel tube and then separated in a second dimension across SDS-polyacrylamide gel (SDS-PAGE) according to their molecular weight.
Staining the gel gives a picture like this, in which each spot is a protein, and the colour and size of the spot indicated abundance;
Comparison of 2D gels can reveal differences in protein expression between different samples. A spot that is present in one gel may be absent in another (indicates expression of protein switched off), or the spot size may be different (indicates difference in protein expression).
To do this, some clever software is required to wrap the image of one gel onto another
However, 2D gel electrophoresis does not provide protein identifications. You may try to guess based on protein mass and isoelectric point, but more normally spots of interest are excised an identified using mass spectrometry.
So what we do with this is;
Proteolysis (is the breakdown of proteins into smaller polypeptides or amino acids.)
We punch out spots, at particular spaces, and then analyse this using mass spec
Quite laborious to do because you need to punch out all these spots
LC-MS/MS
With LC-MS, peptides are separated in the aqueous phase, rather than on a gel. This is much better suited to high throughput.
Sample à identified peptides à inferred proteins
Shotgun Proteomics Workflow
- *1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry** - *4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference**
1. Sample preparation
Depending on what proteins you have,
Samples often fractionated to reduced complexity (trying to analyse a full sample will make life difficult), maybe by running a 1 D gel, cutting it into ten or so slices and analysing each slice in turn.
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
2.Proteolysis step
Break up proteins into much smaller peptides
Mass spectrometers can typically only measure masses up to about 3,000 daltons (Da).
Proteins are much larger than this, so need to be broken down into much smaller peptides prior to analysis.
Most people use TRYPSIN – a proteolytic enzyme that cleaves proteins after arginine (R) and lysine (K) except when followed by proline (P)
Example; we have a protein sequence and TRYPSIN is cutting up to smaller peptides, typically these peptides are in the region of 4 – 30 amino acids in length, which is “just the kind of mass” to be analysed in mass spectrometry.
We have gone from hundreds of intact proteins, to thousands of peptides, which we can analyse in mass spec, but they are all mixed up
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
3. Separation of peptides
High performance liquid chromatography (HPLC) is used to separate peptides in the sample.
The HPLC passes the sample through a column (purple represents column) packed with chemicals through which peptides move at different speeds according to their physiochemical properties.
The result of this is that peptides come out (elute) from the column at different times;
In reality the mixture isn’t that simple ^, separation is rarely complete because samples are simply too complex. So we get something like this;
As the sample comes out (elutes from) the HPLC, small aliquouts are passed to a mass spectrometer for analysis, we take things out sequentially as it comes out, this where the mass spectrometry comes in;
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
4.Tandem mass spectrometry
First phase of the mass spec;
4a. MS1 (Peptides)
In the first phase of tandem mass spectrometry (MS1), the masses of the peptides present in the aliquot from the particular elution time are measued.
Mass spectrometry is used to determine the mass of the peptides in the sample, what you get is an ion count / m/z, the peaks are different masses that is what the numbers represent.
What we would say looking at the sequence, where we would get the basic spectrum with peaks on it with numbers of where the peaks are.
There might be up to seven peptides in this sample (from the number of peaks)
What is mass spec? For the purpose of this lecture, simply think of it as a way of measuring the mass of molecules.
Peptide Mass Fingerprinting
Each peak represents a peptide, and because it is essentially the sum of the masses of the peptides amino acids it gives us an idea of what the peptide is
Identifying peptides from their masses is known as peptide mass fingerprinting.
The problem is that peptides with different sequences can have equal masses. For example;
We need a second MS step (MS2) if we are to differentiate between these
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
4b. MS2 (products)
MS2 is used to give more specific information about each peptide
After each MS1 scan, peptide ions with the highest peaks are taken (one at a time) and smashed into pieces (fragmented) in a collision cell.
The mass of the fragments (PRODUCT IONS) produced by each peptide are then measured by mass spectrometry. The result is a MS2 spectum, in which each peak represents one fragment of the peptide. For example;
(YouTube Video)
Conveniently, the peptide backbone is weaker than the bonds within the amino acids, so most fragmentation occurs along the backbone. In the general case, that looks like this;
If you look at a generic chemical structure of peptide you would expect most of the fragmentation too occur somewhere along the backbone for a given copy of a peptide you would expect it to break at one of these points (a1…b1…etc.)
Clever thing is that if you look at another fragment that breaks elsewhere it will give you another mass and then the difference between will help determine the amino acid
Principle Simon – this was one of his main research areas
We can assume that the majority of peptides are broken in only one place, giving two pieces – usually a B ION and Y ION. This means that we can work out the sequence of the peptide by considering the mass differences between the most intense MS2 peaks, for example;
However, experimental MS2 spectra are rarely perfect so reading the sequence directly from a spectrum is not easy.
It is more common to do a “database search” where the peptide spectra are compared to 1,000s of simulated spectra derived from a database of peptide sequences that may be present in the sample.
For human samples, these sequences would be the human proteome, which comes from… The genome!
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
5. Peptide spectrum matching
Experimental peptide spectra are matched to simulated spectra from the data using some statistical similarity score (e. correlation coefficient).
For each spectrum, a list of potential matches will be found.
In the simplest algorithms, these matches are ranked and only the highest scoring match retained.
If there is no match above a certain threshold score then the spectrum is flagged as unidentified. There are many reasons why a spectrum may not be identified, and many spectra (sometimes >50%) can fall into this category.
Shotgun Proteomics Workflow
1. Sample preparation
2. Proteolysis step
3. Separation of peptides
4. Tandem mass spectrometry
4a. MS1 (peptides)
4b. MS2 (products)
5. Peptide spectrum matching
6. Protein inference
6. Protein inference
So, from one sample we end up with 1,000s of peptides, each with a peptide sequence which it seems to match to. But this is proteomics – we want proteins not peptides. Protein presence is inferred by mapping peptides to the proteins in the reference sequence.
Because tryptic peptides are short, many of them are ambiguous (matching to many proteins). This is usually resolved using some probabilistic algorithm.
Finally, we get our list of identified proteins. Quantitation is also possible – but too complex to cover today!
example of open-source software we can use to illustrate the data
Many steps in this process, with lots of computation and statistical analysis – it is important to take all the evidence into account to understand what was found.
Some applications of proteomics & transcriptomics
PROTEOMICS – WILL PROBABLY COME UP IN THE EXAM!
Biomarker discovery
Typically finding differences in expression between cases and controls (healthy/disease) this was shown earlier on the diagonal line example.
Network analysis
The idea, of systems biology where building networks of interaction between genes, proteins, or both.
Building models to describe how the system works, thinking about how engineers do things, they don’t do things like we do in biology they don’t just think im going to make a phone and start playing around till they have a phone.
In biology we do a lot, with loads of random things, until we find something
We are moving towards a much more evidence based approach, with a much more engineering type of approach where we try to model things using interaction networks.
Proteins , genes, metabolites, these kind of knows all interacting with each other – how they work
Based on these interaction – until recently it has been difficult to find out how they interact but now we can analyse different examples and actually try to build interactive models – ongoing area of work
Gene annotation
Mapping transcripts and proteins back to a genome can show us gene structure – Prof has been involved in this, were transcriptomics was used to localise an assemble, allowing us to see the different transcripts in the sample.
Allows annotation of genes and genome to show which bits produced transcripts and which produces other things…