TB2-1: understanding your protein Flashcards
Many biochemical experiments require pure samples of proteins for analysis. List 4 uses of pure protein in in vitro experiments (i.e. what is being determined as opposed to the method used)
- Strucutre determination
- Binding property determination
- Analysis of protein function
- Reconstituion of functional systems from components
Name 3 methods used for strucutral determination of protens that require a pure sample
-NMR
- crystallography
- cryo EM
Why is having a pure protein sample for in vitro (and in vivo) experiments so important? What happens when there isn’t a pure sample (broad two outcomes)?
-The better the sample, the increase in unambiguous data
(makes the experiment less ambiguous)
… Not having a purified protein:
- miss important findings
- attribute the wrong function to proteins
Name an example demonstrating how poor protein purification can cause a protein to be attributed to an incorrect function
Acyl transferase activity resulting from an impurity in endophilin purifications
(the impurity contained the function)
Describe the ‘acyl transferase/endophilin’ example that demonstrates how an impurity in protein purification can lead to a protein being attributed with an incorrect function
Originally believed that endophilin 1 (lipid) was able to convert lipids from outward to inward pointy cones (adopting very different curvatures). Therefore, the protein/enzyme endophilin was attributed with the function of affecting membrane curvature and acting as an acyl transferase.
Later paper showed the activity was caused by an E.coli protein contaminant (because the human protein of interest had been expressed in E.coli)
On a very basic level, why would you consider it important to know what type of protein you are investigating before making it and carrying out an experiment?
Different types of protein are made using different approaches
List 4 characteristics to consider when deciding what type of protein you have.
- single/multi-domain?
- intracellular, extracellular, integral membrane?
- folded, natively unfolded, partially folded?
- Post translational modification?
On a basic level, why would you want to consider whether a protein is intracellular, extracellular or an integral membrane protein before making it (and using it in experiments for investigation of the protein)?
These different proteins all have a different protein context which causes a difference in how you go about making and expressing them. This is because they have all evolved to survive in different environments
What key question should be considered when investigating a protein that has post translational modifications?
Do you want the protein to have this post translational modification in your experiments?
i.e. should you make the protein with these modifications or not?
Why might you not want a protein you’re investigating to have their post-translational modification in experiments you carry out in their investigation?
Having the post translational modification can lead to heterogeneity which causes difficulty in experiments due to the increased complexity?
What does “heterogeneity” mean?
“the quality or state of being diverse in character or content”
If you were investigating a virus protein (that binds to human surface proteins) would you want to include post-translational modifications or not? Why/why not? What is the post-translational modification?
Modification = glycosylation
You would want to include post-translational modifications.
This is because glycans (glycosylation events) are required for virus proteins to bind to human surface proteins (therefore essential for function)
List 4 reasons for why you want to make a protein for investigation (i.e. what is the purpose of the experiment)?
- for crystallisation
- to reconstitute a biological system
- to assess pathogen binding
- to dissect the funtion of different domains
If you are making a protein for the purpose of crystallisation, what should you consider when making the protein?
(more in a later module)
want to minimise disorder (of the crystal structure?)
e.g. glycans block crystal contacts meaning you are unable to solve
If you are making a protein for the purpose of reconstituting a biological system, what should you consider in the making of the protein?
You might want it to be as close to the native as possible…
e.g. include any naturally occurring modifications (glycosylations) and produce the whole protein (as opposed to e.g. a single domain)
If you are making a protein for the purpose of asssessing pathogen binding, what should you consider in the making of the protein?
(similar to previous question)
you would want it properly glycosylated and processed (i.e. include any post translational modifications)
In addition to keeping post-translational modifications (glycosylation) when studying virus proteins that bind to the human cell surface, what else might you consider for the making of your protein?
Probably only need to make the extracellular part of the molecule as this is the region that is bound to the virus (not the intracellular).
Useful because allows for a soluble context whereas if the whole protein was used, would require a membrane context
Other parts of the protein are unlikely to be affecting virus binding.
What is a fusion protein?
A protein made from a fusion gene which is made by joining two different parts of genes
Why would you want to add a fusion protein/peptide to modify your protein of investigation?
Useful for purification
e.g. His tag
List three ways you might wish to modify your protein of interest (i.e. consider before the making stage)
- adding fusion proteins or peptides
- choosing which section of the protein to produce
- making mutations to remove post-translational modifications
Why might you choose to only make part of your protein of interest? (3 points)
- Remove parts of the protein predicted to be disordered, felxible, that might interfere with protein function
- only express e.g. a single domain that mediates all the binding interaction of a multidomain protein
(this would make the experiment easier because it is simplified) - consider if membrane anchors or transmembrane helices are required
Using a single domain from a mutlidomain protein might make the experiment easier because it’s simplified, but what assumption has been made and how would you prove this assumption is true?
Assumption that the singular domain contains the whole function of the protein and mediates all the binding interactions
Prove with control experiments
When investigating a protein, what might you want to know about the protein structure before making the protein for use in experiments?
List 6 questions.
Is there a structure already available (in the pdb)?
Is there a secretion signal?
Are there transmembrane regions or anchors?
Are there post-translational modifications?
Are there regions of disorder?
What is the domain architecture?
Presence of a secretion signal in a protein is useful to predict what?
Useful to predict the location in which the portein is found
What do different secretion signals correspond to?
Different target destinations
Where is the signal peptide in eukaryotes located?
At the N terminal
Where are eukaryotic proteins with an N terminal signal peptide likely to be located to?
Directed to the ER (and Golgi) and secreted
What important observation can be made about the tertiary structure of proteins that are secreted/extracellular?
They have lots of disulphide bonds (able to form these because of the extracellular reducing environment compared to oxidising intracellular)
They tend to be well structured and folded
Do eukaryotic proteins have a strict consensus sequence for secretion signal peptides? If so, what is it?
No, it does not have a strict consensus sequence
It has particular chemical features instead
Eukaryotic secretion signal peptides don’t have strict consensus sequences but they do have particular chemical features. Describe these chemical features.
~22 amino acids long
pos charged N terminal region (1-5 a.a.)
hydrophobic central region (7-15)
polar (uncharged) C terminal region (3-7)
followed by a cleavage site
Why do eukaryotic secretion signal peptides have a C terminal cleavage site?
Because the signal peptide is removed from the rest of the protein in the ER
Name a program that predicts the probability of there are signal peptides in a protein sequence based off “learning” from known examples?
SignalP-5.0
What two types of secondary structures commonly make up the transmembrane region or anchor or a protein?
Helices or beta-barrel
How long are the shortest transmembrane helices? i.e the shortest length to span the membrane
~18 residues long
Are transmembrane helices generally hydrophilic or phobic?
Hydrophobic
What properties of transmembrane helices’ residues is used to predict whether a protein contains them?
Length of helix
Hydrophobicity
Describe how it is predicted if there are transmembrane helices in a protein?
Scan along a peptide sequence using a ~20 residue box and measure the hydrophobicity within the box
(If plot hydrophobicity against residue number will form peaks which indicate the transmembrane helices; number of peaks = no. helices and residue number indicates location in protein)
Name a program that can be used to predict the probability of there being a transmembrane region or anchor?
TMHMM
Why is it difficult to predict the presence of beta barrels in a protein?
Cannot use only length and hydrophobicity like transmembrane helices
Because tey often have alternating hydrophobic and hydrophilic residues
When using programs such as TMHMM to predict the presence of transmembrane regions/anchors, why might there be a false positive prediction? What does this correspond to and where is it located in the sequence? Why is the probability low?
Signal peptides often come up as hits in the N terminus because they contain a hydrophobic region (but with low probability because they are not ~20 a.a. long but rather 7-15 in their hydrophobic region)
Many biochemical experiments require pure samples of proteins for analysis. List 4 uses of pure protein in in vitro experiments (i.e. what is being determined as opposed to the method used)
- Structure determination
- Binding property determination
- Analysis of protein function
- Reconstitution of functional systems from components
Name two types of software used to predict GPI-anchors or lipid modifications?
CSS-PALM
Pred-GPI
What is the consensus sequence for N-linked glycans?
N-x-S/T
What does the N refer to in N-linked glycosylation?
Asparagine
Are all N-x-S/T sequences glycosylated?
no - not all of them
Programs must take the sequence context into account when predicting N-linked glycans, what specific thing should they look for and why?
look for an N-linked secretion signal because this makes the protein go to the ER and golgi
- the protein must go the golgi in order to be glycosylated
What is the name given to the prgram most commonly used to predict N-linked glycosylation?
NetNGlyc
n.b. predicts POTENTIAL N-linked sites
Give a glycosylation example for each of eanting to remove the modification and keep it.
keep - virus protein
-remove - crystallography application
How would you remove the N-linked glycan modification?
single point mutation at either:
- N or S/T
- n.b. not ‘x’
Why can you introduce a point mutation to either N or S/T in order to remove N-linked glycan? and not at ‘x’?
Both N and S/T are important for recognition
‘x’ can be any amino acid therefore has no effect
What percentage of the human proteome is predicted to be disordered (no fixed order)?
37-50% (large amount!)
How would you define a disordered region compared to an ordered region of a protein?
Ordered- region vibrates around a specific conformation
disordered - no particular consensus position
Name 4 examples where you might see regions of disorder in a protein.
- disorder loops/termini in an ordered protein
- disordered linkers between domains
- longer disordered regions in an otherwise ordered protein
- largely natively unfolded proteins
Give an example of a natively disordered protein
Histones
(must flex and wrap around DNA)
What is the common function of natively unfolded proteins?
to act as “hubs” for many proteins to bind to
Would you want to keep or remove disorder for crystallisation?
remove
Where is the ordered part of the protein usually located?
at the core
What are common traits of a disordered region of a protein? (3 key points)
more hydrophilic residues
fewer hydrophobic residues
lower complexity e.g. hydrophilic residue repeats
By comparing to ordered regions, why do disordered regions of proteins have
more hydrophilic residues
fewer hydrophobic residues
lower complexity e.g. hydrophilic residue repeats?
ordered core of the protein has the hydrophobic effect on folding (hence ordered has more hydrophobic residues)
increased complexity because core uses all 20 a.a. compared to repeats
Name the two programs commonly used to determine regions of disorder
RONN
flDPnm
What information does RONN and flDPnm use in order to predict regions of disorder within a protein?
common features of disordered regions
- also “learns” from known examples
What two programs are used to explore domain architecture?
Phyre and Fugue
How do Phyre and Fugue identify known domain architectures?
- use known PDB structures as templates
- use structural homology to determine if regions are similar to the target protein
What is one advantage and disadvantage of using Phyre and Fugue for determining domain architecture?
- adv = freely available
- disadv. = takes~2-3 hrs
up to using de novo structures