single cell transcriptomics Flashcards

1
Q

heterogeneity in cell populations

A
  • cell types
  • somatic mutations
  • cell cycle stage
  • epigenetic modifications
  • stochastic gene expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

limitations of bulk assays

A
  • assuming homogeneous relationships can lead you to the wrong conclusion
  • rare cell types can become lost
  • can’t see real time changes
    • need to order by differentiation progress, not time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

process of SCT

A
  • isoolate cells
  • lyse
  • reverse transcribe and amplify cDNA
  • qPCR or RNAseq
  • up to 10,000s of genes in 10,000s of cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

methods of single cell isolation

A
  • low throughput:
    • manual/automated micropippetting
    • cytoplasmic aspiration
  • high throughput:
    • FACS
    • microfluidics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

qPCR

A
  • quantitative/real-time PCR
  • gene specific PCR primers
  • include housekeeping genes (GADPH)
  • fluorescent dye to detect PCR product
  • measure Ct value for each gene
    • threshold cycle number
  • normalise data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

qPCR normalisation

A
  • higher Ct means less cDNA
  • arbitrary maximum Ct value
  • calculate ΔCt for each gene
    • max - gene
    • higher Δ means more cDNA
  • normalise with hk genes
    • assume hk expression constant
    • calculate gene ΔCt - hk ΔCt
  • doubling cycles so subtraction not division
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RNAseq

A
  • sequence cDNA library
  • map reads to reference
  • count read number for each gene
  • need quality control
  • can have coverage bias (5’/3’) in some protocols
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

technical dropouts

A
  • zero counts
  • common
  • when some mRNA not captured during reverse transcription
  • capture efficiency:
    • % of mRNA molecules in cell lysate detected
    • often 10-20%
  • more frequent in low expression genes
  • varies between cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RNAseq normalisation

A
  • convert raw read counts into expression levels per cell
  • correct for cell to cell variation
    • in capture, amplification, sequencing efficiency
  • method depends on protocol used
    • spike in or UMI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

extrinsic spike-ins

A
  • add RNA of known sequence and quantity to lysate
  • internal control
  • equal quantity in each lysate
  • normalise counts by number of reads mapped by spike in RNA
  • assumes same capture, amplification and sequencing efficiencies
  • be cautious:
    • no 5’ cap or polyA tail
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

UMIs

A
  • unique molecular identifiers
  • barcode on each cDNA moelcule
  • 6-10 nt added before amplification
  • track how much of amplified DNA comes form original molecule
  • count number of unique UMIs associated with each gene
  • assume library sequenced to saturation
  • corrects for variation in amplification efficiency but not other sources
    • e.g. reverse transcription
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

normalisation without spike in or UMI

A
  • same as used by bulk RNAseq data
  • assume hk gene expression or total mRNA content the same
  • normalise read counts by hk expression/total mRNA
  • cna also combine techniques
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SC data analysis techniques

A
  • clustering
  • dimensionality reduction
  • differential expression
  • pseudotemporal ordering
  • network interference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

single cell clustering

A
  • cluster by trancriptomic profile to:
    • analyse sub-population structure
    • identify cell sub-types/rare cell types
  • cluster by cell expression states to:
    • identify co-varying genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SC clustering methods

A
  • partitional
    • produces disjoint groups
    • k-means
  • hierarchical clustering
    • divisive or agglomerative
    • hierarchical tree
    • can provide more information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

k-means clustering

A
  • algorithms putting data points into k clusters
  • cluster points with the most similar mean average
  • have to choose k
17
Q

bi-clustering

A
  • allows clustering by genes and cells simultaneously
  • find genes that behave similarly within a cell cluster
  • can give better resolution
  • only some subsets are informative
18
Q

dimensionality reduction

A
  • transform to lower dimensional space e.g. 3D to 2D
  • easier to visualise data and detect patterns
  • often before clustering
    • distance can behave non-intuitively in high dimensions
  • many algorithms with different assumptions
    • PCA
19
Q

PCA

A
  • linear transformation of uncorrelated principal components
  • PCs:
    • orthogonal
    • ordered by contribution to variance in data
    • weighted sum or original dimensions
  • PC1:
    • vector through dataset giving largest amount of variation
    • PC2 gives second largest
  • shows which genes contribute most to heterogeneity
  • can combine with clustering to analyse distinct populations
20
Q

differential expression

A
  • aim to detect difference in gene expression levels or distribution between 2 cell populations
  • statistical tests
    • T-test, Mann-Whitney
    • specialised methods needed for noise/dropouts
      • more noise in SC than bulk
  • multiple testing corrections
21
Q

GO enrichment

A
  • gene ontology
    • describes gene functions and relationships between them
    • molecular funciton, cellular component, biological process
  • identify terms over/underrepresented in a given set of genes
  • select input list of genes of interest
    • DE genes, bi-clustering genes
22
Q

GO output

A
  • calculate probability of seeing observed sample frequency by chance given the background frequency for each term
    • sample frequency = no of genes annotated to that term in the input
    • background frequency = no of genes annotated to a term in the background set
    • background set = all genes in the genome
  • identify which terms appear more frequently than expected
23
Q

pseudotemporal ordering

A
  • aim to infer gene expression dynamics from snapshot data
    • true temporal data unavailable
    • measurement destroys cells by lysis
  • cells ordered by progress through a biological process
    • differentiation, response to stimuli
  • sampling time may not correlate well with stages
    • asynchrony
  • assumes cells follow same response or differentiation path
  • monocle algorithm
24
Q

gene regulatory network inference

A
  • aim to ocnstruct a network graph where nodes represent genes and edges indicate regulatory interactions
  • assumes that a strong statistical relationship between 2 gene expression profiles indicates a potential functional relationship
    • correlation vs mutual information
  • linked groups of genes indicate that they undergo coordinated changes in expression
  • some correlations may be confounding factors e.g. cell cycle
  • may need to choose subpopulations
25
Q

correlation vs mutual information

A
  • correlation
    • most common measure for strength of a statistical relationship
  • mutual information
    • alternative that can identify non-linear relationships