Lecture 27 (RR15): Systems biology approaches. Flashcards
Genome sequence
- At the end of the 1990s, John Horgan and Russel Stannard came to the conclusion that after we have the genome sequence and all these organisms, we will have answered all of the questions that haunted us.
- The genome sequence is an alphabet for us. It allows us how to make sentences paragraphs, chapters and eventually how all of this unfolds into a novel.
- The analogy would be the genome sequence gives rise to the mRNA that gives rise to the protein, and proteins work together in complexes and work in processes within the cell and together they make a living cell, that cooperates with other cells to give rise to organs and then organs work with tissue then it develops an organism.
- This sequence provided us with tools and not many answers just based on the genome sequence itself.
Analysis of full genomes indicates that much characterisation remains to be done
When they looked at the genomes of a number of organisms that had been sequences. It was clear that there are a lot of common genes that seem to be essential (processes that are critical for life)
* The basic cellular toolkit shared by each organism is strikingly conserved
* Genes required for cellular metabolism make up a large proportion of the total number of genes
* Transcription and translation related genes are also present in significant number
* The vast majority of the genes identified are considered to be of unknown function. Almost more than half of the genes that are annotated from these genome sequences fall into this class (unknown function).
Analysis of full genomes indicates that much characterisation remains to be done
When they looked at the genomes of a number of organisms that had been sequences. It was clear that there are a lot of common genes that seem to be essential (processes that are critical for life)
* The basic cellular toolkit shared by each organism is strikingly conserved
* Genes required for cellular metabolism make up a large proportion of the total number of genes
* Transcription and translation related genes are also present in significant number
* The vast majority of the genes identified are considered to be of unknown function. Almost more than half of the genes that are annotated from these genome sequences fall into this class (unknown function).
Modifying yeast genome
important
This is a technique used to figure out what the genes might be doing.
*This technique was to eliminate every single one of those predicted genes, one at a time, to understand what the contribution of that individual gene might be to the growth of the viability of that organism.
Steps:
* The disruption construct is introduced into diploid yeast cells to replace the appropriate region.
* Presence of the dominant selectable marker confers drug resistance (G-418) so the cells can grow on drug.
→ such that the region would be replaced with this drug resistance gene. So, this eliminates the actual gene sequence and replaces it with an actual marker.
* HOW TO ADD HETEROLOGOUS SEQUENCES:
1) you make PCR products that would contain a flanking sequence that correspond to a particular sequence you want to eliminate. As long as 3’ end of primer interacts with template during PCR, you can have long sequences that hangs off. This allows you to add on sequences that were not additionally there (how we do lots of mutagenesis = ability to add on sections on each end of a given amplicon using this property and rounds of linear amplification).
2) you can add on double stranded oligonucleotide sequences and adapt them to the ends. You can select for it because of this dominant drug resistance marker.
* When allowed to sporulate the haploid progeny (spores) will either have a wild type chromosome or a recombinant chromosome. Out of 4 haploid cells : 2 spores will have wild type chromosome, 2 will be modified (include the dominant selectable marker).
> if only 2 out of the 4 spores grow out, then you know that that was an essential gene- need it to survive.
* The effects of the gene replacement can then be assessed ie…viability or growth rate. Evaluate the contribution of this gene product or loss of this gene product in various conditions. Assess what that gene contributed in growth and survival.
OVERALL: helps you identify how individual genes contribute to the growth and viability of a given organism.
Summary of this: You can make these disruption constructs for every predicted gene in the yeast genome. Then carry out a transformation of the yeast, introduce this construct in and hope that you have a homologous recombination event that will take place. Which, even though it’s one in a million or something like that, you can select for, using your dominant selectable marker drug resistance gene. All the other yeasts that didn’t do it will die. But if they actually did this homologous recombination and incorporated that dominant selectable marker on one of the chromosomes, then the yeast will live.
You end up with yeast that have one chromosome where you have modified the genome sequence and eliminated a single gene and replaced it with this marker.
The other chromosome is the wild type.
Functional genomics
- C. elegans researchers “knocked-down” every predicted transcription unit on chromosome 1 by using feeding RNAi…(feed animals double stranded RNA and you get a phenotype that corresponds to the loss of function of the gene that corresponded to those double stranded RNAs).
- …of the analysed genes, 339 were assigned some function as determined by the visible RNAi phenotype
Later they did it for each predicted gene in the C. elegans genome (~19,000 genes) - In this organism, RNAi is systemic, so even though the small RNAs get taken up into the gut, it goes into every cell in every organ thereafter.
- Use this to understand how every single predicted gene might contribute to the development of physiology of a growing animal.
Steps of functional genomics
1) Looked at all the genes predicted on chromosome
2) Made a double strand RNAi library so that they could make double stranded RNA that correspond to every single one of those genes that were predicted on chromosome 1.
3) Analyzed the animals after putting them into a buffer and allowing them to take up the double stranded RNA (animals ingested them).
4) Found a spectrum of different phenotypes that arose. Allowed them to assign 339 gene functions, in a single pass experiment.
- Realized they could do the whole genome using the same technology.
- The predicted 18 000 genes that are present in the C elegans genome were subjected to making double stranded RNA that corresponds to each one of these predicted genes and then used to make librairies.
- In this case, the double stranded RNAs were incorporated into bacteria that actually made them when you induced the bacteria IPTG, to express genes in relatively high levels
- Express convergent RNAs. DsRNA will be formed in the actual bacteria that you can use to feed to C elegans 1800 times.
These libraries can then be used all over the world. Each expresses the double stranded RNA that corresponds to a given C elegans gene.
What did functional analysis allow for?
By carrying the analysis above, they identified a number of new gene functions.
* A variety of genes were found to give rise to the various visible phenotypes
* Genes that gave rise to phenotypes like embryonic lethality and sterility (cause animals to die) were often involved in basal cellular function (required in all cells in organisms in order to survive)…
* Genes that affect post embryonic aspects tend to be more specific (involved in more specialized functions that are typical to the organism).
* RNAi analysis allowed researchers to assign function to unknown genes by simply feeding C elegans bacteria that expressed dsRNA to each individual predicted gene in the entire genome.
Proteomic analysis
- Takes advantage of the idea that transcription factors are modular (DNA binding domains and transcriptional activation domains).
- Investigators decided to add on specific proteins to these domains that they thought might interact together or associate in an in vivo situation.
They could use the idea of the modularity of DNA binding transcription factors to test whether two proteins actually associate.
Steps:
- Add a DNA binding domain to protein A
- Add a transcriptional activation function to protein B that you think might interact with protein A.
- Neither one of these together can activate transcription. Because A cannot be a transcriptional activation domain for the DNA binding domain (yellow). And B cannot be a DNA binding domain for red.
- HOWEVER, if A and B come together and associated, presumably you could reconstitute that transcription factor and activate genes downstream that contain the DNA binding sequence that is recognized by the yellow (DNA binding domain).
- You can assess whether protein A and B came together.
Take it one step further : Add protein A to what we’ll refer to as the bait. Then assess all of the proteins in a given cDNA library, if you’ve attached a transcriptional activation function to all of those C DNA’s. They could be referred to as the prey. Any of the cDNAs that encode a protein following the transformation that interacts with protein A would allow you do know simply by sequencing the nature of the protein. In this case, the protein B was amongst the thousands of variants that you transformed initially.
- It is all dependent on reconstituting a transcriptional activator function by virtue of a protein protein interaction.
- A lot simpler than immunoprecipitation.
- The overall goal is to understand what proteins interact with which proteins at the genome wide level. This is important because if we do not understand the function of a given protein, the proteins that it interacts with might give us a clue. The protein-protein interactions tell us all the partners of this protein A is interacting with.
Different types of fragment complementation
Protein fragement complementation assays
Yeast Two Hybrid Assay: Reconstituting a TF to activate a selectable marker. Now they use similar concepts but with different means of indicating whether you have a bait-prey interaction. These are referred to **protein fragment complementation assays. **
Protein fragment complementation assays: based on the idea that you can separate a protein into 2 split halves and in those split halves, neither one of the subset protein can actually carry out the function. There is no absolute affinity for those 2 halves of proteins to come together. However, if you attach the two halves to a bait and prey. If the bait and prey come together then they can reconstitute that given protein and its function. Often uses GFP - divide into n-terminal and c-terminal that usually don’t like to come together through a protein-protein interaction conferred by the bait and prey. By associating, you reconstitute GFP, thereby enhancing the fluorescence that you can detect.
Association of 2 proteins that you reconstitute this particular protein function.
BioID
**BioID-using proximity labelling to tag the proteins in your neighbourhood
* New strategy that evolved from an initial strategy called dam ID, which takes advantage of prokaryotic dam methylase (important for adding methyl groups).
* Use a prokaryotic heterologous gene product that would allow us somehow tag all the proteins that come close to a given protein that you are interested in. This gives you an idea of the protein or proteomic environment, the neighborhood around a given protein you are interested in.
* You do this by using an enzyme that comes from E Coli, and E coli uses this enzyme (biotin ligament) to add on biotin to its acetyl Co-A carboxylase gene.
* There were some mutations in this gene (Biotin ligase), BirA that reduces its specificity and allow it to release Biotin much more readily.
STEPS:
* **If you use that variant Bir A then you can add biotin molecules to anything within a specific vicinity of the particular enzyme. **
1) You can add the sequence that corresponds with BirA to any protein you are interested in and then transfect cells with that new vector that has your protein bait that is fused to BirA.
2) Once the protein that you are interested in is expressed in the cells that you have transfected or the organism that you have transfected, then the protein will add on biotin molecules to all the proteins that come within a reasonable distance.
3) Eventually, any of the proteins that are interacting with that bait will becomes tagged with biotin molecules.
4) When you have biotin that is attached to a protein, it makes it easy to purify it. Run the sample over liquid chromatographic column that is bound with streptavidin. The biotin that is present on all of those proteins that were tagged in vivo will attach to streptavidin with high affinity and you will capture all of those proteins that were in close proximity to you protein of interest.
5) Then you can take the proteins that were bound to the streptavidin and run them through mass spectrometry to identify each of them. This gives you an immediate idea of the proteins that were near the cell at that given time.
* The new variations mentioned are very fast, you get the information very quickly (APEX, APEX2 and TurboID)
What does BioID help wih?
- You can get an understanding of which proteins are interacting with which other proteins when you cover the entire proteome, you have maps.
- When you look genome wide, you start to get an idea of what those individual proteins might be doing.
- The neighborhood helps you learn something about its function.