Single Cell Lectures Flashcards
Encode project
where all the parts are and what they do
Gtex
after encode
variants and relate what it does
How do we understand the genotype effect
GWAS Catalog
The basic unit of life
the cell
Hooke in 1665
first look at dead plant cells
Leeuwenhoek in 1675
first look at a live cell
Most diversity is in which organ tissue
brain-lots of different jobs
From all cells come
cells
Human Cell Atlas Project
sequencing individual cells to find function and variants
How many tissues are in the human cell atlas project
1-2,000
Bulk RNA sequencing
analyze gene expression change in a mixture of cell types
Single cell RNA sequencing
analyzing gene expression in a single cell or nuclei
Why use a single cell perspective
basic tenet of biological variation
- within organ systems individual cell types or their subtypes vary proportionally and behave transcriptionally different depending on their environment
Why sc/snRNAseq approach can help solve biological problems
multiple hypothesis testing with cell types
- cell type composition often changes in an organ over time or upon perturbation
-cell to cell communication through altered gene expression is dynamic between cell types
-individual gene expression by cell type can vary within an organ across individuals and disease vs. healthy
- gene expression changes in one cell type can alter the fate of differentiation of other cells
Key advancements in single cell RNA seq
integrated fluidic circuits
nanodroplets
In situ barcoding
What drives scRNAseq technology adoption
cost
ease of the technique
data robustness
experimental objectives
personnel bias
Major steps to sc/snRNAseq data generation
- lipid encapsulation of beads, cells and transcription enzyme mix
- cell lysis and mRNA binding to the capture beads
- cDNA synthesis with reverse transcriptase
- pooling all multi-barcoded cDNA and sequencing
Splicing occurs in nuclei in
pre mRNA
What do we gain form cell atlas data
molecular profiles that define cell type and their subtypes
unique cell types by tissue
gene markers that define cell type
the general transcriptional behavior of cell types
Tissue preparation
- dissect tissues-> live cells use enzymatic digestion
- filter out everything except cells
- FACS/MACS sorting of cells
Cell/Nuclei isolation points
Tissue source will dictate isolation protocols
cell liberation conditions highly variable by tissue source
cell lysis conditions highly variable by tissue source for nuclei preparation
Live cell isolation technique
proteases
Nuclei isolation technique
detergents
A mammalian diploid cell has
10-30 pg total RNA and <0.1 pg mRNA
nuclear RNA is 10-20% of total RNA
Live cells give what type of RNA
mRNA
Nuclei gives what type of RNA
pre-mRNA- contains introns
Nuclei use advantages
sample processing logistics
pre-mRNA processing can be measured
less stress and mitochondrial signal
cell state is more accurately captured
Cell Use advantages
more complete transcriptome
detection sensitivity
better connection with translation
What are the keys to single cell transcript identification
10x barcode- what cell
UMI- unique material id-unique material
Major steps in single-cell or nuclei data processing
filtering noise
normalization
neighbor networks- dimension reduction and clustering
A good droplet should include
barcode bead
cell
Doublet
droplet with two cells
Ambient RNA
relating to the immediate surroundings of something- RNA
Which type-Nucleic or cell contain more ambient RNA
Nucleic because you have to pop the cell to get to the nucleus- allows ambient RNA in
Normalization
the practice of organizing data entries to ensure they appear similar across all fields and records
Why do we normalize the data
cells can have different numbers of gene counts owing to differences in mRNA containing volume (cell size) or purely randomly during sequencing
What are batch effects
sequencing depth
technologies
sample quality
technician
cell cycle
Two types of batch effects
technical and biological
example of biological batch effects
cell cycle
Principle components
when a collection of points in a real coordinate space are a sequence of unit vectors
Principal components analysis
a process of computing the principle components and using them to perform a change of basis on the data
Data integration is important because
it allows us to compare data
gets rid of bias
Best practice for batch correction algorithm
Harmony
Data integration steps
- soft assign cells to clusters, favoring mixed dataset representation
- get cluster centroids for each data set
- get dataset correction factors for each cluster
- move cells based on soft cluster membership
Two types of neighbor networks
dimension reduction
clustering
Why do we use dimensionality reduction “Feature selection”
with thousands of individual cells and genes per cell for each sample it is necessary to reduce the complexity of the data for visual inspection and to facilitate downstream clustering
PCA
principal components analysis projects a set of possibly correlated variables into a set of linear orthogonal variables
t-SNE
t-distributed stochastic neighbor embedding creates a probability distribution using the Gaussian distribution that defines the relationships between the points in high-dimensional space
UMAP
uniform manifold approximation and projection. A UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology
Dimensionality reduction is
highly variable
Clustering
grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups
Nearest neighbor graph
directed graph defined for a set of points in a metric space KNN
K-means
interactively finds a predefined number of k cluster centers (centroids) by minimizing the sum of the squared Euclidean distance between each cell and its closest centroid
Hierarchical
two types
1) agglomerative- individual cells are progressively merged into clusters according to distance measures
2) divisive- each cell is split into small groups recursively until individual data level
Community
nodes refer to cells and cell-cell pairwise distances are applied in the Leiden algorithm
Optimizing graph modularity locally on all nodes, then each small community is grouped into one node and the first step is repeated
Steps for clustering
KNN graph
- find communities
initial partition
-refine
-aggregate network
-refine
Final partition
Underlying concept of mapping cell clusters to cell identities
a set of genes within a cluster of cells or nuclei will be significantly different in their level of expression compared to all other clusters of cells or nuclei
discovery of differentially expressed genes steps
aligned dataset
integrated analysis
compare composition
compare expression for aligned cells
Underlying concept for differential gene expression by cell type: bulk RNAseq principles
single cell data sets are negative binomial distributed that is define as a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed trials before a specified number of successes occur
Pseudo bulk analysis
the method applies generalized linear mixed models with random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual
DEG process
the sample view aggregates counts per sample-label combination to create pseudobulks
sc or snRNA Data Analysis Summary
sequence reads
generate count matrix
filter cells using quality metrics
normalize data and regress out unwanted variation- integration
clustering
marker identification
1) trajectory analysis 2) DE of cell types or genes between sample groups 3) custom analyses
Deconvolution
a process of resolving something into its constituent elements or removing complication in order to clarify it
Goal of deconvolution
estimate the proportion of a cell type present among a heterogenous mixture of cells using expressed marker genes that define a specific cell type
Trajectory
the curve that a body describes in space; a path, progression, or line of development resembling a physical trajectory
Single-cell or trajectory Analysis
a collection of cells is a snapshot of their transcriptomes that are each at distinct points in their dynamic state of being
Cell trajectory analysis
allocation of cells to lineages and then ordering them based on pseudotime values within lineages
Pseudotime
the distance along the trajectory form its position back to the beginning
Trajectory analyses outcomes
discover unique cell linages
estimate differences between differentially expressed genes between linages
determine which genes are potentially driving cell differentiation
Trajectory goatl
estimate how gene expression levels change along cells or nuclei placed in a continuous path
Transcription factors
a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence