HC6: Big data analyses in immunology Flashcards

HC 6

1
Q

Big Data: the three V’s

A
  • Volume of data: a lot
  • Velocity of processing of data: fast processing
  • Variety of data sources: multiple sources, omics part of it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ad-hoc tools

A

For Big Data analysis: to gain speed, certain programs needed
> scripting with known functions: just inputs required and know what to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big Data experiments

A

Data-driven hypothesis testing from publicly available large datasets without performing every experiment
> because: raw data needs to be stored in every experiment: re-use
> make hypothesis on previous data and verify in vitro or in vivo
> some questions can be answered with data that is already there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Omics role in immunology

A
  • Transcriptome
  • Cytome: expression proteins on cell surface
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use of single cell RNA sequencing (scRNAseq)

A
  • More detailed information of individual cells
  • Reveals expression heterogeneity and subpopulations
    > However: more expensive and more complex analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

scRNAseq technologies

A
  • Plate based approach
  • Droplet
  • Microwell separation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Plate based scRNAseq: SMART-seq2, MARS-seq

A
  • FACS
  • One cell in one well: 384 well plate
  • Physical separation
  • Cells are individually sorted in wells plate with lysis buffer > cell lysis, RT, and downstream processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Droplet based scRNAseq: 10xGenomics

A
  • Microfluidic chip system: each cell incapsulated into oil droplets together with barcoded gel bead > cell lysis and RT within each droplet
  • Barcoded cDNA is pooled for downstream processing, but distinguished with barcode
  • Oil and water will divide: separation cells with gel beads with probes with barcodes (hairy bead): hybridization RNA with probe with barcode on gel beads
  • flow creates droplets with beads and cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Microwell separation scRNAseq: BD Rhapsody

A
  • Cells are loaded in microwells together with barcoded magnetic bead
  • Very tiny wells: flow across plate to get one cell per well
  • Cell lysis is performed within each microwell where RNA of each cell binds > barcoded magnetic bead > also tagged with probes with barcode: all unique barcodes for cells
  • Downstream processing performed on pooled beads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

scRNAseq technologies and tissue atlas generation

A

Good with 10x and Rhapsody, less with plate based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Rare population profiling with scRNAseq techniques

A

Best with plate based
> less cells loaded (96 plate or 384 plate), less good for tissue atlas
> deep sequencing depth (10000 genes)
> 10x and Rhapsody: Shallow depth (2500 genes)
> rare populations: rare and you know the sorting: deep sequencing with plate based: more information of this little populations
> shallow sequencing won’t cover it: for example only B-cells taken: top genes are the same, but deeper genes might expose rare subpopulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Plate based scRNAseq and throughput

A

Lower throughput > time-consuming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Increase information for multi-omics after scRNAseq

A
  • Protein markers: CITE-seq
    > for plate based, 10x and Rhapsody
  • Epigenomics: ATAC-seq
    > 10x and Rhapsody
    BCR/TCR sequencing
    > mostly 10x and Rhapsody
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Limitation scRNAseq

A

No spatial organization taken > the way the single cells were organized in the tissue is completely lost
> for some tissues: highly organized in which the same cell might behave differently (transcriptome) depending on where it is located eg lymph nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spatial transcriptomics techniques

A
  • Visium
  • MERFISH
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MERFISH

A

In situ hybridization
> high-throughput fluorescence in situ hybridization (FISH) in which expression of RNA is visualized by fluorescent probe with complementarity
> tissue placed on slide
> DNA probes for DNA of interest (gene of interest) will hybridize with RNA of interest > look where gene expressed
> picture on microscope
> FISH: white dots where expression gene of interest
> MERFISH: use barcode multiplexing, visualize up to 10,000 genes: make for each gene barcode sequence and do multiple rounds of fluorochroma labelled oligo flow for parts of barcode to identify all genes
> targeted analysis: only knwon transcripts are visualized: difficult quantification and comparison but nice picture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Visium by 10xGenomics

A
  • Spacial glass slide which contains 4x capture area
  • Stick tissue slide on glass slide
  • Spots on glass slide: hairy spots: barcoded spots containing oligo (DNA probe) containing barcode for each spot (location on slide) and sequence to catch RNA (hybridize with polyA of mRNA)
    > additional barcode for location this time!
  • Tissue placed on capture area and slides go into visium machine and picture is taken
  • Then: tissue is permeabilized and RNA of each cell is released and hybridizes with spot-barcoded oligos
  • RNA is sequenced and expression per spot is obtained
  • overlay expression with first picture taken
18
Q

Problem Visium

A

Not single cell resolution
> up to 10 cells can release RNA and get hybridized with one barcoded hairy spot (multiple hairs on spot)
> 1-10 cells per spot
> bulk spatial RNA seq of small biopsy of 1-10 cells really

19
Q

Visium HD by 10x Genomics

A
  • Barcoded spots so close to each other, single cell resolution reached
  • Single cell transcriptome with spatial orientation
20
Q

Immunotyping

A

Characterization of immune cells based on their cellular phenotype

21
Q

THE technique for immunotyping

A

Flow cytometry
> differentiation based on expression specific markers on immune cells
> use antibodies specific for these markers, labeled with different fluorochromes, which can distinguish the different cells
> measured due to different specific emission spectra after excitation by laser

22
Q

CD markers for immune cells and what is CD

A

CD: cluster of differentiation: determining which cell it is, but do have functions
> CD3: T-cell
> CD19: B-cell
> CD14: monocyte
> CD16 / CD56: NK-cell

23
Q

Deep immunotyping

A

Simultaneous detection of large numbers (up to 50) cellular markers

24
Q

Different ways to achieve deep immunotyping

A
  • Increased number of lasers
  • Tandem dyes
  • Spectral flow cytometry
  • CyTOF
25
Q

Deep immunotyping: increased lasers

A
  • Increasing spectrum of excitation light that can be used > allows for use additional fluorochromes (UV/IR lasers possible to further increase fluorochrome availablity > more markers and antibodies included)
26
Q

Deep immunotyping: Tandem Dyes

A
  • Two fluorochromes are bound together so that the emission light of the first excited fluorochrome by laser excited the second fluorochrome
    > shift emission spectrum: more molecules and antibodies can be used
27
Q

Deep immunotyping: Spectral flow cytometry

A

Detection of entire emission spectrum of a fluorochrome rather than specific wavelengths
> not single detectors used for single filtered band of wavelengths (less filters used)
> entire spectrum measured
> more variety: better distinguishing of fluorochromes that are similar in a small band of wavelength
> increase fluorochromes that can be used
» completely different machine required!

28
Q

Deep immunotyping: CyTOF

A
  • Antibodies labelled with metals
  • Each cell is subjected to mass cytometry (mix FC and MS) by time of flight
  • For each cell you obtain distinctive metal mass spectrum (intensity vs mass)
  • No lasers used > work with mass rather than light
  • Less overlap between different masses of metals than fluorochrome spectra
    > more antibody-metal conjugates can be used for more markers
    > up to 100 markers
29
Q

Deep immunotyping requires :

A

Big data analysis
> increase number of markers that are simultaneously detected
> Need for dimensionality reduction and advance computational tools for analysis
> need for dimensionality reduction for overview: UMAP > to 2 dimensions > cluster cells of similar expression / transcriptome

30
Q

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq)

A

Sequencing of BCR and TCR repertoire
> they allow B and T-cells for antigen recognition at antigen binding site
> uniquely made per antigen
> VDJ recombination for TCR and BCR
» through cutting and pasting on DNA level

31
Q

Variety in BCR/TCR

A
  • VDJ recombination: random choice of gene fragments
  • Overhangs appear through cutting and pasting: these are filled with random nucleotides: create enormous variability
    > these parts are CDR3 for example, one of the Ag binding loops with a lot of variability: junction of V-(D)J, main site Ag interaction
    » this process happens independently in every B and T cell during development
32
Q

Can you sequence TCR/BCR with normal sequencing protocols

A

No, these use reference genome for same genes
> shotgun fragmentation > sequencing and alignment to reference
> reference genome are little blocks of V,(D), J, and C segments: alignment will never function

33
Q

AIRR seq pipeline

A
  • Do not chop DNA or RNA (no reference)
  • Long read sequencing
  • Alignment within each gene area (gene segments)
    > for V and J segments (variable and joining, not diversity D, too short)
  • No reference for the start J and end of V (junction, CDR3): identification of it
  • Everyone has own repertoire of TCRs and BCRs that are sequences
    > expressed as: V-gene, J-gene and CDR3
34
Q

Applications AIRR-seq

A
  • Get antibodies for treatment or research: antibody discovery
    > Create new antibody in antibody discovery: immunized mice and gain all B-cells, screen for antigen specific BCR, sequence BCR, expression of antibodies and testing for efficacy, mAbs against CD marker
    > Diagnostic and monitoring: find hugely expressed BCR in B-cell lymphoma, use treatment against BCR as treatment, when sequence comes back when monitoring, tumor has reawakened
    > Research: understanding immune system in health and disease
    » understand TCR-epitope binding: super powerful for cellular therapies, diagnostics and vaccine design: to recognize epitope like for CAR therapy, design specific CAR to kill tumor, because finding and expandin ex vivo TCR that works in patient is expensive
    » Track antibody formation and maturation: vaccination, autoimmunity, alloimmunization (how to steer response to get best antibody response)
35
Q

Track B-cell maturation

A

CD71+ activated B-cell > recent GC graduate, rGCG > becomes either switched memory B cell or long lived ASCs (antibody secreting cell)
> activated B cell only short time after infection
> decision point at exiting GC, close expression profiles
> Immunotyping: lineage tracing in mice, barcode for BCR: V-J-CDR3 barcode for every B-cell
> multimodal/ multi-omics scRNAseq with BD Rhapsody > sequence BCR > phenotype cells with Ab oligos for specific markers like CDR3 (not in plate based, too many cells)
> UMAPs can be made RNA based or protein marker based.

36
Q

Steps data analysis

A

0: Input Data
1: Quality control
2: Normalization, selection HVG and scaling
3: Dimensionality reduction 1
4: Clustering
5: Dimensionality reduction 2
6: define cell identity
6A: find marker genes
6B: check expression of defining genes
6C: Algorithm for cell type identification

37
Q
  1. Quality control Big Data
A
  • Remove outliers: empty well, dead cells, doublets, dying cells
    > Dying cells: content mitochondrial genes (should be <20%): mtDNA gets into cytoplasm, gets too high
    > Doublets: high genes count
    > Empty wells/droplets: low genes cont
38
Q

2: Normalization, selection of HVG and scaling

A

Pre-processing step performed to reduce experimental and technical confounders in dataset in order to highlight biological signals
1: data normalization
2: selection Highly Variable Gene: important step: select genes that show highest variation across cells and are probably related to specific cell behavior or phenotype. Selection for downstream analysis on these genes
3: Data scaling, for visualization

39
Q
  1. Dimensionality reduction
A

Big data: too many dimensions for easy analysis
> Principal Component Analysis (PCA): linear method that selects the components (group of genes) that together are inducing most variation in data
> Elbow plot: shows how many variance explained for top number of PCs > when little increase in variance > stop including > choose set amount of PCs

40
Q
  1. Clustering
A

Comparing each cell to each other in scRNAseq to identify differences between cells is time and computional demanding
> single cells can be of same cell type
> clustering: cells with very similar transcriptional profile are grouped together and treated as one entity in comparative analysis
» too strict clustering: separation of cells that are same type
» too loose clustering: grouping of cells that are different type

41
Q
  1. Dimensionality reduction 2
A

For visualization purposes > too many dimensions too see
> to two-dimensional picture for biological meaningful representation
> tSNE or UMAP
> tSNE focuses on local data structures
> UMAP focuses on both local and global data structures

42
Q
  1. Define cell identity
A

A: Unsupervised approach: find marker genes
> find marker genes of each cluster, specifically expressed in cluster and whose pattern could define cells of that cluster
> differential expression analysis among all clusters against each other
B: Supervised approach: check expression of defining genes
> when you know which cells are included in data set and you know marker genes
> check expression of marker genes
C: Semi-supervised: Algorithm for cell type identification
> you kind of know which cells might be in dataset, but no defining marker genes for cell types
> algorithm that tries to match total expression profile of all cells with that of specifically defined and sorted immune cell populations from databases