Lecture 11&12 - Neural Networks for Protein Structure Flashcards
What is an embedding?
An object moved into a higher dimensional space. These embeddings are usually initialized randomly.
How is similarity measured between 2 vectors?
Cosine similarity is calculated (the angle between two vectors). The Euclidian distance can also be used.
What is the role of a transformer?
The goal of a transformer is to take a set of vectors representing something and transform them into vectors with new contextual information.
Transformers are permutation invariant, so the order of the input vectors does not affect the output.
What is a self attention mechanism?
A self-attention mechanism is able to capture relationships between terms and how relevant they are in context.
What are the steps in a scaled dot-product self-attention mechanism?
- Dot products are calculated between all pairs of input vectors.
- These are then scaled by the square root of the input vector dimensions.
- Similarities are normalized using the softmax function, making sure all rows in the matrix add up to 1. This matrix is the attention matrix.
- Each row of the attention matrix is used as weights in the weighted average of the next input vector.
- The weighted averaged become the output vectors.
How can a transformer be trained to provide useful embeddings?
Using a Bidirectional Encoder Representations from Transformers (BERT). An objective function is needed for this.
One such objective used by BERT involves encoding randomly masked versions of the input text (or amino acid sequence).
The transformer is then trained to correctly predict the tokens that have been masked out.
The performance of BERT in predicting the masked tokens is scored with a cross-entropy loss function.
What is masking?
Masking means replacing the original token (word/letter/amino acid) with a placeholder indicating “don’t know”
What are the different ways a language model can be used?
Single-task supervised training
Unsupervised pre-training + Supervised fine-tuning
Unsupervised pre-training + Supervised training of small downstream classifier
Unsupervised pre-training at scale + prompting (Future)
Explain Single-task Supervised Training.
LM is trained on labelled sequences to predict correct labels.
Explain Unsupervised pre-training + Supervised fine-tuning
This involves first training a transformer LM using BERT on unlabelled sequences.
Then, a new output layer is added, and the entire model is trained on labelled sequences to predict the desired labels.
Explain Unsupervised pre-training + Supervised training of small downstream classifier
Involves pretraining of a transformer LM using BERT on unlabelled sequences, followed by freezing the weights of the LM.
The frozen LM outputs for labelled sequences are then used as inputs to train a new, smaller downstream classifier to predict labels.
Explain what a pre-trained language model is, and how it can be used.
For a given input sequence, the fixed pretrained language model generates embeddings of each residue.
These embeddings can then be averaged to produce an embedding for the entire sequence, which can be used as input to train a downstream classifier for a specific prediction task.
What are correlated mutations in proteins and how is it used in protein structure prediction?
Residues in close proximity have a tendency to covary, likely to maintain a stable microenvironment.
This means that changes at one site can be compensated for by a mutation in another site.
By observing patterns of covarying residues in deep MSAs of homologous sequences, we can infer structural information, specifically accurate lists of contacting residues.
What is the outline of AlphaFold?
AlphaFold2 in outline ENCODES a multiple sequence alignment using transformer blocks to produce an EMBEDDING, and then DECODES the MSA embedding to generate 3-D coordinates.
How are transformers applied to MSAs?
Attention mechanisms can be applied along rows (across different residues in a single sequence) and down columns (across the same residue position in different homologous sequences).
Columns with gaps in the target sequence are often removed for simplicity.
BERT can again be used for training models on MSAs.
How is a 3-D structure created in AlphaFold?
AlphaFold2 produces a 3-D structure from the processed MSA information.
AlphaFold is able to project the data (in multiple dimensions) down to a 3-D projection
The structural module is able to ensure that the 3-D structure produced is in line with steric and torsion rules.
What are some limitations of AlphaFold?
Model quality depends on having a good multiple sequence alignment: sequences with few homologs ( may result in poor predictions.
Reliance on evolutionary information means that AF2 cannot predict mutation effects or things like antibody structures.
It only produces a single “maximum likelihood” conformation: there is a need for better methods that can model conformational change.