Lecture 11&12 - Neural Networks for Protein Structure Flashcards

Question 1

Q

What is an embedding?

Answer

A

An object moved into a higher dimensional space. These embeddings are usually initialized randomly.

Question 2

Q

How is similarity measured between 2 vectors?

Answer

A

Cosine similarity is calculated (the angle between two vectors). The Euclidian distance can also be used.

Question 3

Q

What is the role of a transformer?

Answer

A

The goal of a transformer is to take a set of vectors representing something and transform them into vectors with new contextual information.

Transformers are permutation invariant, so the order of the input vectors does not affect the output.

Question 4

Q

What is a self attention mechanism?

Answer

A

A self-attention mechanism is able to capture relationships between terms and how relevant they are in context.

Question 5

Q

What are the steps in a scaled dot-product self-attention mechanism?

Answer

A

Dot products are calculated between all pairs of input vectors.
These are then scaled by the square root of the input vector dimensions.
Similarities are normalized using the softmax function, making sure all rows in the matrix add up to 1. This matrix is the attention matrix.
Each row of the attention matrix is used as weights in the weighted average of the next input vector.
The weighted averaged become the output vectors.

Question 6

Q

How can a transformer be trained to provide useful embeddings?

Answer

A

Using a Bidirectional Encoder Representations from Transformers (BERT). An objective function is needed for this.

One such objective used by BERT involves encoding randomly masked versions of the input text (or amino acid sequence).

The transformer is then trained to correctly predict the tokens that have been masked out.

The performance of BERT in predicting the masked tokens is scored with a cross-entropy loss function.

Question 7

Q

What is masking?

Answer

A

Masking means replacing the original token (word/letter/amino acid) with a placeholder indicating “don’t know”

Question 8

Q

What are the different ways a language model can be used?

Answer

A

Single-task supervised training

Unsupervised pre-training + Supervised fine-tuning

Unsupervised pre-training + Supervised training of small downstream classifier

Unsupervised pre-training at scale + prompting (Future)

Question 9

Q

Explain Single-task Supervised Training.

Answer

A

LM is trained on labelled sequences to predict correct labels.

Question 10

Q

Explain Unsupervised pre-training + Supervised fine-tuning

Answer

A

This involves first training a transformer LM using BERT on unlabelled sequences.

Then, a new output layer is added, and the entire model is trained on labelled sequences to predict the desired labels.

Question 11

Q

Explain Unsupervised pre-training + Supervised training of small downstream classifier

Answer

A

Involves pretraining of a transformer LM using BERT on unlabelled sequences, followed by freezing the weights of the LM.

The frozen LM outputs for labelled sequences are then used as inputs to train a new, smaller downstream classifier to predict labels.

Question 12

Q

Explain what a pre-trained language model is, and how it can be used.

Answer

A

For a given input sequence, the fixed pretrained language model generates embeddings of each residue.

These embeddings can then be averaged to produce an embedding for the entire sequence, which can be used as input to train a downstream classifier for a specific prediction task.

Question 13

Q

What are correlated mutations in proteins and how is it used in protein structure prediction?

Answer

A

Residues in close proximity have a tendency to covary, likely to maintain a stable microenvironment.

This means that changes at one site can be compensated for by a mutation in another site.

By observing patterns of covarying residues in deep MSAs of homologous sequences, we can infer structural information, specifically accurate lists of contacting residues.

Question 14

Q

What is the outline of AlphaFold?

Answer

A

AlphaFold2 in outline ENCODES a multiple sequence alignment using transformer blocks to produce an EMBEDDING, and then DECODES the MSA embedding to generate 3-D coordinates.

Question 15

Q

How are transformers applied to MSAs?

Answer

A

Attention mechanisms can be applied along rows (across different residues in a single sequence) and down columns (across the same residue position in different homologous sequences).

Columns with gaps in the target sequence are often removed for simplicity.

BERT can again be used for training models on MSAs.

Question 16

Q

How is a 3-D structure created in AlphaFold?

Answer

Study These Flashcards

A

AlphaFold2 produces a 3-D structure from the processed MSA information.

AlphaFold is able to project the data (in multiple dimensions) down to a 3-D projection

The structural module is able to ensure that the 3-D structure produced is in line with steric and torsion rules.

Question 17

Q

What are some limitations of AlphaFold?

Answer

Study These Flashcards

A

Model quality depends on having a good multiple sequence alignment: sequences with few homologs ( may result in poor predictions.

Reliance on evolutionary information means that AF2 cannot predict mutation effects or things like antibody structures.

It only produces a single “maximum likelihood” conformation: there is a need for better methods that can model conformational change.

Lecture 11&12 - Neural Networks for Protein Structure Flashcards

(17 cards)