HC6: Big data analyses in immunology Flashcards

HC 6

1
Q

Big Data: the three V’s

A
  • Volume of data: a lot
  • Velocity of processing of data: fast processing
  • Variety of data sources: multiple sources, omics part of it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ad-hoc tools

A

For Big Data analysis: to gain speed, certain programs needed
> scripting with known functions: just inputs required and know what to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big Data experiments

A

Data-driven hypothesis testing from publicly available large datasets without performing every experiment
> because: raw data needs to be stored in every experiment: re-use
> make hypothesis on previous data and verify in vitro or in vivo
> some questions can be answered with data that is already there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Omics role in immunology

A
  • Transcriptome
  • Cytome: expression proteins on cell surface
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use of single cell RNA sequencing (scRNAseq)

A
  • More detailed information of individual cells
  • Reveals expression heterogeneity and subpopulations
    > However: more expensive and more complex analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

scRNAseq technologies

A
  • Plate based approach
  • Droplet
  • Microwell separation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Plate based scRNAseq: SMART-seq2, MARS-seq

A
  • FACS
  • One cell in one well: 384 well plate
  • Physical separation
  • Cells are individually sorted in wells plate with lysis buffer > cell lysis, RT, and downstream processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Droplet based scRNAseq: 10xGenomics

A
  • Microfluidic chip system: each cell incapsulated into oil droplets together with barcoded gel bead > cell lysis and RT within each droplet
  • Barcoded cDNA is pooled for downstream processing, but distinguished with barcode
  • Oil and water will divide: separation cells with gel beads with probes with barcodes (hairy bead): hybridization RNA with probe with barcode on gel beads
  • flow creates droplets with beads and cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Microwell separation scRNAseq: BD Rhapsody

A
  • Cells are loaded in microwells together with barcoded magnetic bead
  • Very tiny wells: flow across plate to get one cell per well
  • Cell lysis is performed within each microwell where RNA of each cell binds > barcoded magnetic bead > also tagged with probes with barcode: all unique barcodes for cells
  • Downstream processing performed on pooled beads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

scRNAseq technologies and tissue atlas generation

A

Good with 10x and Rhapsody, less with plate based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Rare population profiling with scRNAseq techniques

A

Best with plate based
> less cells loaded (96 plate or 384 plate), less good for tissue atlas
> deep sequencing depth (10000 genes)
> 10x and Rhapsody: Shallow depth (2500 genes)
> rare populations: rare and you know the sorting: deep sequencing with plate based: more information of this little populations
> shallow sequencing won’t cover it: for example only B-cells taken: top genes are the same, but deeper genes might expose rare subpopulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Plate based scRNAseq and throughput

A

Lower throughput > time-consuming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Increase information for multi-omics after scRNAseq

A
  • Protein markers: CITE-seq
    > for plate based, 10x and Rhapsody
  • Epigenomics: ATAC-seq
    > 10x and Rhapsody
    BCR/TCR sequencing
    > mostly 10x and Rhapsody
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Limitation scRNAseq

A

No spatial organization taken > the way the single cells were organized in the tissue is completely lost
> for some tissues: highly organized in which the same cell might behave differently (transcriptome) depending on where it is located eg lymph nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spatial transcriptomics techniques

A
  • Visium
  • MERFISH
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MERFISH

A

In situ hybridization
> high-throughput fluorescence in situ hybridization (FISH) in which expression of RNA is visualized by fluorescent probe with complementarity
> tissue placed on slide
> DNA probes for DNA of interest (gene of interest) will hybridize with RNA of interest > look where gene expressed
> picture on microscope
> FISH: white dots where expression gene of interest
> MERFISH: use barcode multiplexing, visualize up to 10,000 genes: make for each gene barcode sequence and do multiple rounds of fluorochroma labelled oligo flow for parts of barcode to identify all genes
> targeted analysis: only knwon transcripts are visualized: difficult quantification and comparison but nice picture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Visium by 10xGenomics

A
  • Spacial glass slide which contains 4x capture area
  • Stick tissue slide on glass slide
  • Spots on glass slide: hairy spots: barcoded spots containing oligo (DNA probe) containing barcode for each spot (location on slide) and sequence to catch RNA (hybridize with polyA of mRNA)
    > additional barcode for location this time!
  • Tissue placed on capture area and slides go into visium machine and picture is taken
  • Then: tissue is permeabilized and RNA of each cell is released and hybridizes with spot-barcoded oligos
  • RNA is sequenced and expression per spot is obtained
  • overlay expression with first picture taken
18
Q

Problem Visium

A

Not single cell resolution
> up to 10 cells can release RNA and get hybridized with one barcoded hairy spot (multiple hairs on spot)
> 1-10 cells per spot
> bulk spatial RNA seq of small biopsy of 1-10 cells really

19
Q

Visium HD by 10x Genomics

A
  • Barcoded spots so close to each other, single cell resolution reached
  • Single cell transcriptome with spatial orientation
20
Q

Immunotyping

A

Characterization of immune cells based on their cellular phenotype

21
Q

THE technique for immunotyping

A

Flow cytometry
> differentiation based on expression specific markers on immune cells
> use antibodies specific for these markers, labeled with different fluorochromes, which can distinguish the different cells
> measured due to different specific emission spectra after excitation by laser

22
Q

CD markers for immune cells and what is CD

A

CD: cluster of differentiation: determining which cell it is, but do have functions
> CD3: T-cell
> CD19: B-cell
> CD14: monocyte
> CD16 / CD56: NK-cell

23
Q

Deep immunotyping

A

Simultaneous detection of large numbers (up to 50) cellular markers

24
Q

Different ways to achieve deep immunotyping

A
  • Increased number of lasers
  • Tandem dyes
  • Spectral flow cytometry
  • CyTOF
25
Deep immunotyping: increased lasers
- Increasing spectrum of excitation light that can be used > allows for use additional fluorochromes (UV/IR lasers possible to further increase fluorochrome availablity > more markers and antibodies included)
26
Deep immunotyping: Tandem Dyes
- Two fluorochromes are bound together so that the emission light of the first excited fluorochrome by laser excited the second fluorochrome > shift emission spectrum: more molecules and antibodies can be used
27
Deep immunotyping: Spectral flow cytometry
Detection of entire emission spectrum of a fluorochrome rather than specific wavelengths > not single detectors used for single filtered band of wavelengths (less filters used) > entire spectrum measured > more variety: better distinguishing of fluorochromes that are similar in a small band of wavelength > increase fluorochromes that can be used >> completely different machine required!
28
Deep immunotyping: CyTOF
- Antibodies labelled with metals - Each cell is subjected to mass cytometry (mix FC and MS) by time of flight - For each cell you obtain distinctive metal mass spectrum (intensity vs mass) - No lasers used > work with mass rather than light - Less overlap between different masses of metals than fluorochrome spectra > more antibody-metal conjugates can be used for more markers > up to 100 markers
29
Deep immunotyping requires :
Big data analysis > increase number of markers that are simultaneously detected > Need for dimensionality reduction and advance computational tools for analysis > need for dimensionality reduction for overview: UMAP > to 2 dimensions > cluster cells of similar expression / transcriptome
30
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq)
Sequencing of BCR and TCR repertoire > they allow B and T-cells for antigen recognition at antigen binding site > uniquely made per antigen > VDJ recombination for TCR and BCR >> through cutting and pasting on DNA level
31
Variety in BCR/TCR
- VDJ recombination: random choice of gene fragments - Overhangs appear through cutting and pasting: these are filled with random nucleotides: create enormous variability > these parts are CDR3 for example, one of the Ag binding loops with a lot of variability: junction of V-(D)J, main site Ag interaction >> this process happens independently in every B and T cell during development
32
Can you sequence TCR/BCR with normal sequencing protocols
No, these use reference genome for same genes > shotgun fragmentation > sequencing and alignment to reference > reference genome are little blocks of V,(D), J, and C segments: alignment will never function
33
AIRR seq pipeline
- Do not chop DNA or RNA (no reference) - Long read sequencing - Alignment within each gene area (gene segments) > for V and J segments (variable and joining, not diversity D, too short) - No reference for the start J and end of V (junction, CDR3): identification of it - Everyone has own repertoire of TCRs and BCRs that are sequences > expressed as: V-gene, J-gene and CDR3
34
Applications AIRR-seq
- Get antibodies for treatment or research: antibody discovery > Create new antibody in antibody discovery: immunized mice and gain all B-cells, screen for antigen specific BCR, sequence BCR, expression of antibodies and testing for efficacy, mAbs against CD marker > Diagnostic and monitoring: find hugely expressed BCR in B-cell lymphoma, use treatment against BCR as treatment, when sequence comes back when monitoring, tumor has reawakened > Research: understanding immune system in health and disease >> understand TCR-epitope binding: super powerful for cellular therapies, diagnostics and vaccine design: to recognize epitope like for CAR therapy, design specific CAR to kill tumor, because finding and expandin ex vivo TCR that works in patient is expensive >> Track antibody formation and maturation: vaccination, autoimmunity, alloimmunization (how to steer response to get best antibody response)
35
Track B-cell maturation
CD71+ activated B-cell > recent GC graduate, rGCG > becomes either switched memory B cell or long lived ASCs (antibody secreting cell) > activated B cell only short time after infection > decision point at exiting GC, close expression profiles > Immunotyping: lineage tracing in mice, barcode for BCR: V-J-CDR3 barcode for every B-cell > multimodal/ multi-omics scRNAseq with BD Rhapsody > sequence BCR > phenotype cells with Ab oligos for specific markers like CDR3 (not in plate based, too many cells) > UMAPs can be made RNA based or protein marker based.
36
Steps data analysis
0: Input Data 1: Quality control 2: Normalization, selection HVG and scaling 3: Dimensionality reduction 1 4: Clustering 5: Dimensionality reduction 2 6: define cell identity 6A: find marker genes 6B: check expression of defining genes 6C: Algorithm for cell type identification
37
1. Quality control Big Data
- Remove outliers: empty well, dead cells, doublets, dying cells > Dying cells: content mitochondrial genes (should be <20%): mtDNA gets into cytoplasm, gets too high > Doublets: high genes count > Empty wells/droplets: low genes cont
38
2: Normalization, selection of HVG and scaling
Pre-processing step performed to reduce experimental and technical confounders in dataset in order to highlight biological signals 1: data normalization 2: selection Highly Variable Gene: important step: select genes that show highest variation across cells and are probably related to specific cell behavior or phenotype. Selection for downstream analysis on these genes 3: Data scaling, for visualization
39
3. Dimensionality reduction
Big data: too many dimensions for easy analysis > Principal Component Analysis (PCA): linear method that selects the components (group of genes) that together are inducing most variation in data > Elbow plot: shows how many variance explained for top number of PCs > when little increase in variance > stop including > choose set amount of PCs
40
4. Clustering
Comparing each cell to each other in scRNAseq to identify differences between cells is time and computional demanding > single cells can be of same cell type > clustering: cells with very similar transcriptional profile are grouped together and treated as one entity in comparative analysis >> too strict clustering: separation of cells that are same type >> too loose clustering: grouping of cells that are different type
41
5. Dimensionality reduction 2
For visualization purposes > too many dimensions too see > to two-dimensional picture for biological meaningful representation > tSNE or UMAP > tSNE focuses on local data structures > UMAP focuses on both local and global data structures
42
6. Define cell identity
A: Unsupervised approach: find marker genes > find marker genes of each cluster, specifically expressed in cluster and whose pattern could define cells of that cluster > differential expression analysis among all clusters against each other B: Supervised approach: check expression of defining genes > when you know which cells are included in data set and you know marker genes > check expression of marker genes C: Semi-supervised: Algorithm for cell type identification > you kind of know which cells might be in dataset, but no defining marker genes for cell types > algorithm that tries to match total expression profile of all cells with that of specifically defined and sorted immune cell populations from databases