Miscellaneous Flashcards
In humans, how much does the size of a gene vary?
From a few hundred DNA bases to more than 2 million
How many chromosomes, base pairs and genes in human genome?
23 pairs chromosomes
3.2 billion base pairs (6.4 billion bases)
25,000 genes
How many genes in human genome?
20,000-25,000
Difference between intron and exon?
Introns are non-coding DNA sequences within a gene that are removed by RNA splicing during maturation of the RNA product.
Exons are protein-coding DNA sequences that have the necessary codons for protein synthesis.
How big is a typical IgG mAb (in kDa and number of aa)?
Roughly 150 kDa and 1400 amino acids long
How is a mAb’s binding surface constructed?
The mAb’s binding surface is the paratope, made up of 6 CDRs or distinct variable loops: 3 on the light chain (L1, L2, L3) and 3 on the heavy chain (H1, H2, H3)
Why is the H3 loop of the CDR so challenging?
H3 loop is challenging because it has great diversity in length, sequence and conformation
What was the first tumor-agnostic FDA approval?
In 2017, FDA approved pembrolizumab (PD-1, Keytruda, Merck) for patients with any metastatic solid tumor who are microsatellite instability-high (MSI-H) or mismatch repair deficient (dMMR)
Tumors that have a dMMR system can develop MSI. Microsatellites are regions of repeated DNA that change in length (show instability) when mismatch repair is not working properly
What are two main types of immunogenicity?
1) Driven by T cells (easy to identify by AI)
2) Driven by immune complex formation (typically because of aggregation)
How many known protein, protein 3D structures, protein-protein complexes vs antibody-antigen complexes are there in the PBD?
200 million known proteins
≈200,000 protein 3D structures
≈100,000 protein-protein complexes
3000 antibody-antigen complexes
How is TCR (octameric) complex organized?
What are CD4/CD8?
2 TCR receptor chains (alpha and beta in 95% T cells; gamma and delta in 5%) that form the ligand-binding site (diversity); associated in membrane with CD3 (contains 4 distinct chains; 6 chains in total: epsilon, delta, gamma, epsilon) and 2 zeta chains
CD4/CD8 are co-receptors
What are 3 signals for T cell activation?
- Antigen recognition (peptide:MHC binds to TCR/CD4 (or CD8); typically takes place in secondary lymph organs)
- Th cells: co-stimulation (CD28) binds to CD80 (B7.1) or CD86 (B7.2) on APC –> T cell proliferation. Cytotoxic T cells: co-stimulation (CD70 and 41BB/CD137)
Lack of signal 2 leads to anergy/tolerance - Cytokines
First TCE to be FDA-approved?
Blinatumomab (Blincyto), CD3/CD19, approved in 2014 for R/R B-ALL. Short half-life (2 hrs) and requires continuous IV Infusion
How many codons exist?
There are 64 different codons: 61 specify amino acids and 3 are used as stop signals.
How many GPCRs are there, and how many are drug target opportunities?
Approx 800, over half are sensory/olfactory, with the remaining 370 presenting drug targeting opportunities
GPCRs as a drug class represent 34% of FDA approvals
What are the two types of epitopes?
Continuous residues –> linear epitope
Discontinuous residues –> conformational epitope
Describe the pMHC complex
pMHC complex consists of: HLA class I heavy chain, beta2-microglobulin and peptide
The MHC class I heavy chain consists of three extracellular domains (α1, α2, α3), a transmembrane domain, and a cytoplasmic domain. β2 microglobulin forms a fourth extracellular domain and is held in the complex by noncovalent interactions.
How many aa exist and what are the major classes?
20 natural amino acids
Major classes: hydrophobic, aromatic, charged, polar
How is 10^36 search space calculated?
20 natural aa over ~30 position of the H1/H2/H3 loops to give 20^30 ~10^39 combinations. Then minus some combinations since those clearly don’t make valid antibodies.
What is the repertoire of unique sequences of antibodies in the human immune system?
10^13
What is the RMSD in structural biology?
The root-mean-square deviation (RMSD) is a measure of how accurate a prediction or model is compared to a reference or target structure. A lower RMSD value indicates a higher level of accuracy in the prediction.
What is the Rosetta classical scoring function?
Calculation of the energy/score of a protein structure based on its 3D shape and physical and chemical properties (bond angles, distances etc).
A lower score is better (indicates more stable and energetically favourable structure)
What is a diffusion model?
Diffusion models are generative models that work by adding noise (Gaussian noise) to the available training data (aka the forward diffusion process) and then reversing the process (aka denoising or the reverse diffusion process) to recover the data
How is a latent space a lower-dimensional representation of the antibody structure?
In the case of antibody structures, the latent space is a mathematical representation of the structure that captures the most important features of the antibody (e.g. overall shape, positions of the amino acids and interactions between the amino acids).
How are camelid and shark antibodies unique?
Camelid: no LC, only VH domain (and no CH1 domain); CDR-H3 loop longer than human mAb H3
Shark: VNAR (variable domain of new antigen receptor) are smallest ag-binding domains found in nature; no LC
Each H chain: 5 CNAR domains + 1 VNAR. HC does not have a CDR-H2 (only H1 and H3)
What is a generative model?
Generative modeling is a way for computers to learn how to create new things that are similar to things they’ve seen before (i.e. the training data)
What are:
- ProteinMPNN
- AbFormer
- AbFold
ProteinMPNN: external benchmark, published and used out of the box (Baker lab)
AbFormer: LLM (sequence only, no structural information)
Generate CDR sequences that are similar to hu CDR sequences
Masked-language model; trained by randomly masking an aa and learn how to reconstruct missing aa in that sequence; trained on 1.3 billion VH sequences from 80 studies (OAS) –> 36 million unique CDR3 sequences + 1 million unique CDR1 + 1 million unique CDR2 sequences.
Then fine-tune AbFormer on therapeutic antibodies
AbFold: H3 design tool based on AlphaFold and AbFormer
What are the Fable Foundation Models (Fable-FM)?
Fable-Abformer (enhances manufacturability and ensures properly folded abs)
Fable-RE:
1) Equivariant transformer (global frame vector graph-based transformer)
2) Transfer learning (pre-training on large sets of protein structure data; to overcome data scarcity challenge)
What are generative and semi-generative models built on top of Fable-FM?
1) RE-Dock = antibody pose on epitope; RE-Diff-NG = will generate poses and loops (Q3)
2) RE-Diff/Abfold = CDR loop conformations (fully de novo CDR loops)
3) RE-Masked lang model = sequence / side chain conformations
Fable pitch deck: what does ab design require?
1) Antibody structure compatible with high affinity binding
2) Antibody sequence that is developable
Fable pitch deck: what are four thing going on at same time?
- Ab pose on epitope (identify ab pose on chosen epitope suitable for high affinity binding)
- CDR loop conformations (given ab pose, design CDR loop conformations for high affinity binding)
- Sequence/side chain conformations (given pose and CDR loops, identify suitable side-chain sequences)
- Manufacturability (choose sequences suitable for human antibody-like behavior)
Why is it easier to in silico predict T cell epitopes compared to B cell epitopes?
T cell epitopes are linear peptides presented in context of MHC, B cell epitopes can be conformational (i.e. structural folds in addition to specific aa sequences)
How many total T cells are there in a human, and how many unique TCRs are expressed within this population?
~3x10^11 total T cells
~1x10^8 unique TCRs within this population
What is self-attention mechanism?
Allows LLMs to weigh importance of different words in a sentence when generating text
What is Werewolf Tx’s platform?
Conditionally activated cytokines (Indukines): systemically delivered as inactive pro-drugs. Indukines are a single molecule containing 1) an inactivation domain, 2) a half-life extension domain, tethered to 3) a fully active IL-2 (tethered by protease-cleavable linker)
IL-2: Ph 1 (onc)
IL-12: Ph 1 (onc)
IFNa: Ph 1 (onc, Jazz Tx)
IL-21: IND (onc)
IL-18: discovery (onc)
What are Bonum Tx’s two ATP-dependent programs?
ATP-dependent cytokines are inactive in circulation; become active with ATP present in TME
ATP-IFNa
ATP-IL12
What are Bonum Tx’s conditionally activated biologics programs?
PD-1/IL-2 (Roche)
PDL1/IFNa
LAG3/IL-2
LRRC15/IFN-a (LRRC15 targets CAFs)
LLRC15/TGFbR2
What are Biolojic’s four programs?
AU-007: mAb specific to the CD25 (IL-2Ra) binding site on IL-2 – prevents IL-2 binding to trimeric receptor (Treg), while allowing binding to dimer receptor (Teff, NK) (Ph 1, Aulos spinout)
IL-13/TSLP dual-specific ab (IND, Teva)
Treg agonist (IL-2 mAb specific for CD122/CD132 epitope, leaving IL-2 free to bind CD25 on Treg) (discovery)
TNFR2 agonist (with Nektar)
What are Bonum Tx’s ATP-dependent programs?
ATP-IL-12
ATP-IFN-a
What’s in Synthekine’s pipeline for I&I?
Ph 1: IL-2 + CD19 CAR (no lymphodepletion)
Preclinical: engineered IL-10 (partial agonist)
Preclinical: engineered IL-22 (partial agonist)
Where do mAbs bind to FcRn?
IgG monoclonal antibodies (mAbs) bind to the neonatal Fc receptor (FcRn) primarily in the acidic environment of endosomes within cells. This interaction occurs mainly in the endothelial cells of blood vessels and in epithelial cells of tissues such as the intestine and kidney.