isoolate cells lyse reverse transcribe and amplify cDNA qPCR or RNAseq up to 10,000s of genes in 10,000s of cells

quantitative/real-time PCR gene specific PCR primers include housekeeping genes (GADPH) fluorescent dye to detect PCR product measure Ct value for each gene threshold cycle number normalise data

higher Ct means less cDNA arbitrary maximum Ct value calculate ΔCt for each gene max - gene higher Δ means more cDNA normalise with hk genes assume hk expression constant calculate gene ΔCt - hk ΔCt doubling cycles so subtraction not division

sequence cDNA library map reads to reference count read number for each gene need quality control can have coverage bias (5'/3') in some protocols

zero counts common when some mRNA not captured during reverse transcription capture efficiency: % of mRNA molecules in cell lysate detected often 10-20% more frequent in low expression genes varies between cells

convert raw read counts into expression levels per cell correct for cell to cell variation in capture, amplification, sequencing efficiency method depends on protocol used spike in or UMI

add RNA of known sequence and quantity to lysate internal control equal quantity in each lysate normalise counts by number of reads mapped by spike in RNA assumes same capture, amplification and sequencing efficiencies be cautious: no 5' cap or polyA tail

unique molecular identifiers barcode on each cDNA moelcule 6-10 nt added before amplification track how much of amplified DNA comes form original molecule count number of unique UMIs associated with each gene assume library sequenced to saturation corrects for variation in amplification efficiency but not other sources e.g. reverse transcription

single cell transcriptomics Flashcards by Chantal Fifield

heterogeneity in cell populations

cell types
somatic mutations
cell cycle stage
epigenetic modifications
stochastic gene expression

How well did you know this?

Not at all

Perfectly

limitations of bulk assays

assuming homogeneous relationships can lead you to the wrong conclusion
rare cell types can become lost
can’t see real time changes
- need to order by differentiation progress, not time

How well did you know this?

Not at all

Perfectly

process of SCT

isoolate cells
lyse
reverse transcribe and amplify cDNA
qPCR or RNAseq
up to 10,000s of genes in 10,000s of cells

How well did you know this?

Not at all

Perfectly

methods of single cell isolation

low throughput:
- manual/automated micropippetting
- cytoplasmic aspiration
high throughput:
- FACS
- microfluidics

How well did you know this?

Not at all

Perfectly

qPCR

quantitative/real-time PCR
gene specific PCR primers
include housekeeping genes (GADPH)
fluorescent dye to detect PCR product
measure Ct value for each gene
- threshold cycle number
normalise data

How well did you know this?

Not at all

Perfectly

qPCR normalisation

higher Ct means less cDNA
arbitrary maximum Ct value
calculate ΔCt for each gene
- max - gene
- higher Δ means more cDNA
normalise with hk genes
- assume hk expression constant
- calculate gene ΔCt - hk ΔCt
doubling cycles so subtraction not division

How well did you know this?

Not at all

Perfectly

RNAseq

sequence cDNA library
map reads to reference
count read number for each gene
need quality control
can have coverage bias (5’/3’) in some protocols

How well did you know this?

Not at all

Perfectly

technical dropouts

zero counts
common
when some mRNA not captured during reverse transcription
capture efficiency:
- % of mRNA molecules in cell lysate detected
- often 10-20%
more frequent in low expression genes
varies between cells

How well did you know this?

Not at all

Perfectly

RNAseq normalisation

convert raw read counts into expression levels per cell
correct for cell to cell variation
- in capture, amplification, sequencing efficiency
method depends on protocol used
- spike in or UMI

How well did you know this?

Not at all

Perfectly

extrinsic spike-ins

add RNA of known sequence and quantity to lysate
internal control
equal quantity in each lysate
normalise counts by number of reads mapped by spike in RNA
assumes same capture, amplification and sequencing efficiencies
be cautious:
- no 5’ cap or polyA tail

How well did you know this?

Not at all

Perfectly

UMIs

unique molecular identifiers
barcode on each cDNA moelcule
6-10 nt added before amplification
track how much of amplified DNA comes form original molecule
count number of unique UMIs associated with each gene
assume library sequenced to saturation
corrects for variation in amplification efficiency but not other sources
- e.g. reverse transcription

How well did you know this?

Not at all

Perfectly

normalisation without spike in or UMI

same as used by bulk RNAseq data
assume hk gene expression or total mRNA content the same
normalise read counts by hk expression/total mRNA
cna also combine techniques

How well did you know this?

Not at all

Perfectly

SC data analysis techniques

clustering
dimensionality reduction
differential expression
pseudotemporal ordering
network interference

How well did you know this?

Not at all

Perfectly

single cell clustering

cluster by trancriptomic profile to:
- analyse sub-population structure
- identify cell sub-types/rare cell types
cluster by cell expression states to:
- identify co-varying genes

How well did you know this?

Not at all

Perfectly

SC clustering methods

partitional
- produces disjoint groups
- k-means
hierarchical clustering
- divisive or agglomerative
- hierarchical tree
- can provide more information

How well did you know this?

Not at all

Perfectly

k-means clustering

Study These Flashcards

algorithms putting data points into k clusters
cluster points with the most similar mean average
have to choose k

bi-clustering

Study These Flashcards

allows clustering by genes and cells simultaneously
find genes that behave similarly within a cell cluster
can give better resolution
only some subsets are informative

dimensionality reduction

Study These Flashcards

transform to lower dimensional space e.g. 3D to 2D
easier to visualise data and detect patterns
often before clustering
- distance can behave non-intuitively in high dimensions
many algorithms with different assumptions
- PCA

PCA

Study These Flashcards

linear transformation of uncorrelated principal components
PCs:
- orthogonal
- ordered by contribution to variance in data
- weighted sum or original dimensions
PC1:
- vector through dataset giving largest amount of variation
- PC2 gives second largest
shows which genes contribute most to heterogeneity
can combine with clustering to analyse distinct populations

differential expression

Study These Flashcards

aim to detect difference in gene expression levels or distribution between 2 cell populations
statistical tests
- T-test, Mann-Whitney
- specialised methods needed for noise/dropouts
  - more noise in SC than bulk
multiple testing corrections

GO enrichment

Study These Flashcards

gene ontology
- describes gene functions and relationships between them
- molecular funciton, cellular component, biological process
identify terms over/underrepresented in a given set of genes
select input list of genes of interest
- DE genes, bi-clustering genes

GO output

Study These Flashcards

calculate probability of seeing observed sample frequency by chance given the background frequency for each term
- sample frequency = no of genes annotated to that term in the input
- background frequency = no of genes annotated to a term in the background set
- background set = all genes in the genome
identify which terms appear more frequently than expected

pseudotemporal ordering

Study These Flashcards

aim to infer gene expression dynamics from snapshot data
- true temporal data unavailable
- measurement destroys cells by lysis
cells ordered by progress through a biological process
- differentiation, response to stimuli
sampling time may not correlate well with stages
- asynchrony
assumes cells follow same response or differentiation path
monocle algorithm

gene regulatory network inference

Study These Flashcards

aim to ocnstruct a network graph where nodes represent genes and edges indicate regulatory interactions
assumes that a strong statistical relationship between 2 gene expression profiles indicates a potential functional relationship
- correlation vs mutual information
linked groups of genes indicate that they undergo coordinated changes in expression
some correlations may be confounding factors e.g. cell cycle
may need to choose subpopulations

correlation vs mutual information

* correlation * most common measure for strength of a statistical relationship * mutual information * alternative that can identify non-linear relationships

single cell transcriptomics Flashcards

(25 cards)