Final Exam Flashcards

1
Q

What can we do with sc-RNA-data?

A
  1. Explore which cell types are present in a tissue
  2. Identify unknown/rare cell types or states
  3. Elucidate the changes in gene expression during differentiation processes or across time or states
  4. Identify genes that are differentially expressed in a particular cell types between conditions (e.g. treatments or disease)
  5. Explore changes in expression among a cell type while incorporating spatial, regulatory, and/ or protein information
  6. Analyze the cell velocity to uncover processes’ direction and activity
  7. Identify cell-level mutations and study their expression
  8. Uncover molecular relationships and regulatory links
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What tools can be used to do gene counts normalization and scaling when exploring sample heterogeneity?

A

Seurat and Cellenics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What tools can be used to perform dimensionality reduction when exploring sample heterogeneity?

A

PCA, UMAP, t-SNE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What tools can be used to perform cell clustering when exploring sample heterogeneity?

A

Seurat and Cellenics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What tools can be used to identify known cell types when exploring sample heterogeneity?

A

SingleR, scTYpe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What tools would you use to identify unknown/rare cell types?

A

Seurat, Cellenics, SingleR, scTYPE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Pseudotime?

A

a latent (unobserved) dimension which measures the cells’ progress through the transition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does it mean to estimate pseudotime?

A

de-confound single cell time series and order the cells by pseudotime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what tools would you use to elucidate changes in gene expression during time or across states?

A

Slingshot, Monocle, PAGA (Partition-based graph abstraction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what can you do with pseudo-time inference?

A
  1. analyze cell similarity and diversity
  2. trace differentiation processes
  3. clonal evolution
  4. cell state transitions of a specific cell type or between different cell types (from cell of origin to development A or B)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what makes scRNA-seq different than bulk RNA-seq?

A

cell level precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what tools would you use for a differential gene expression analysis (DGE)?

A

Seurat, Cellenics, Deseq2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do you measure to determine cell velocity?

A

spliced vs. unspliced transcripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what tools could you use to measure cell velocity?

A

scVELO, Velocito

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what tools could you use for multimodal analysis?

A

Seurat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what can you integrate multimodal analysis with?

A
  • special data (sequence or image based)
  • sc-ATAC-seq data
  • cell surface protein and T-cell receptor (TCR)/immunoglobulin clonotyping (IG)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

a cell is found to have more spliced transcripts than unspliced transcripts. is the expression increasing or decreasing?

A

decreasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

in cell velocity, what is the curve called when the unspliced counts are increasing? EDIT WORDING LOOK AT LECTURE 1

A

induction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

in cell velocity, what is the curve called when the unspliced counts are increasing?EDIT WORDING LOOK AT LECTURE 1

A

repression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what tools could you use to study cell level mutations

A

cExecute + variant caller, scReadCounts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

T/F averaged expression is equal to within cell molecular relationships

A

false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

An inverse correlation between target and suppressor genes can indicate

A

potential regulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the benefits associated with the technological advances in scRNA-seq?

A
  1. number of analyzed cells increased
  2. cost exponentially reduced
  3. number of published papers increased
  4. technology evolved using more sophisticated, accurate, high throughput analyses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how do you isolate single cells for scRNA-seq?

A
  1. limiting dilution (plate based)
  2. micromanipulation
  3. laser capture microdissection (LCM)
  4. fluorescence-activated cell sorting (FACS)
  5. Circulating Tumor Cells (CTC)
  6. micro fluids-based scRNA-seq
  7. droplet-based scRNA-seq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what are the cons of plate based single cell isolation

A

low throughout put and efficiency (historical significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what are the pros and cons of fluorescence-activated cell sorting

A
  1. targeted cell isolation
  2. high-precision sorting
  3. multiparameter sorting
  4. cell viability
  5. limited by marker availability
  6. throughput and time efficiency lower than other methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what are the pros and cons of circulating tumor cells as a method of single cell isolation)

A
  1. utilization of antibodies to specifically target and capture CTCs from peripheral blood
  2. rarity of CTCs in the bloodstream
  3. potential for bias in antibody-based capture
  4. sensitivity and specificity of the chosen antibodies
  5. throughput and time-efficiency lower than other methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what are the pros and cons of microfluidics-based scRNA-seq

A
  1. precise manipulation of cells and fluids at a microscope
  2. ability to integrate multiple steps into a single microfluidic chip, reducing sample loss and technical availability
  3. lower throughput
  4. complexity and cost of the microfluidic chips
  5. low efficiency for small or fragile cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

describe the process of microfluidics-based scRNA-seq

A

capturing and processing individual cells in microfluidic channels or chambers, aiming at controlled environment benefits studying of specific cell types or low-abundance transcripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

describe the process of droplet-based dcRNA-seq

A

encapsulating individual cells in oil droplets, each containing a unique barcode. designed to process a high number of cells in a single run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what are the pros and cons of droplet-based scRNA-seq?

A
  1. Scalability and parallel processing
  2. Reduced cost and time per cell
  3. Large scale and high throughput by barcoded beads in droplets, which tag the mRNA of individual cells
  4. Difficulty in capturing large/irregularly shaped cells
  5. Potential for capturing multiple cells in a droplet
  6. Many cells = lower depth of sequencing per cell
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what company developed the drop-seq platform?

A

10X genomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what are the features of the dip-seq platform?

A
  1. uses droplets for single-cell isolation
  2. no ERCC spike-ins
  3. 8 bp UMI
  4. no full length coverage
  5. PCR amplification
  6. not usable for bulk
  7. paired-end sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what are the features of the SmartSeq2 platform?

A
  1. uses FACS for single cell isolation
  2. ERCC spike-ins
  3. no UMI
  4. full length coverage
  5. PCR amplification
  6. usable for bulk
  7. single-end sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what are UMIs and why can they be helpful?

A

unique molecular identifiers- short nucleotide sequences added to RNA molecules before amplification with the aim to tag each original RNA molecule uniquely, allowing the differentiation between true RNA molecules and PCR duplicates. This significantly improves the quantitative accuracy of scRNA-seq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what is the aim of full length transcript sequencing and why can it be useful?

A

to sequence the entire RNA molecule from the 5’ to the 3’ end. provides comprehensive info about transcript isoforms, alternative splicing events, and other post-transcriptional modifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

why are UMIs and full length transcript sequencing incompatible?

A

full length sequencing requires reading the entire RNA transcript, so if a UMI is added only to one end, it becomes ineffective of gets lost in the process of sequencing the full length transcript

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

in what type of methods are UMIs particularly useful?

A

counting gene expression (i.e. counting transcripts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

which platform is the majority of existing scRNA-seq data generated on?

A

10X genomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

t/f smart-seq2 is a plate based method

A

false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

t/f the smartseq2 analytical pipeline of the scRNA_seq data for each cell is analogous to bulk RNA-seq

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

in what kinds of cells do you expect higher than average mitochondrial content?

A
  • myocytes
  • brown adipocytes
  • neurons
  • sperm cells
  • oocytes
  • hepatocytes
  • endocrine cells
43
Q

what can high mitochondrial gene expression indicate?

A

cell stress
apoptosis
low RNA integrity
low-quality RNA extraction
technical errors in library prep

44
Q

what can high ribosomal gene expression indicate?

A

RNA integrity
cell viability
technical artifacts (such as cell doublets)
batch effects
cell heterogeneity

45
Q

are mitochondrial or ribosomal genes used more often as a QC metric for scRNA-seq

A

mitochondrial

46
Q

t/f the use of UMIs in 10X genomics provide lower quantitation accuracy in ribosomal gene expression detection

A

true

47
Q

t/f 10X genomics detects more genes than Smart-seq2

A

false

48
Q

t/f 10X genomics identifies more cell clusters/types than Smartseq2

A

true

49
Q

t/f 10X genomics has a higher dropout ratio than Smart-seq2

A

true

50
Q

what are the pros and cons of 10X genomics visium

A
  • integrates well with existing 10x genomics workflow
  • offers a relatively large capture area, which is beneficial for analyzing tissue sections
  • provides high quality data with robust technical support
  • limited to predefined capture areas, which may not suit all experimental designs
  • the cost can be relatively high
  • the capture areas are not cell-resolution
51
Q

what are the different platforms for single cell spatial transcriptomics

A
  1. 10X Genomics Vision
  2. StereoSeq
  3. Nanostring GeoMx digital spatial profiler
  4. Slide-seq
  5. Seq-scope
  6. Merfish
52
Q

what are the pros and cons of StereoSeq?

A
  • high spatial resolution
  • comprehensive coverage
  • flexibility in targeting (can target a wide variety of RNA species)
  • compatibility with standard histological samples
  • complexity and cost
  • requires robust bioinformatics support
  • instrumentation requirements
53
Q

what are the pros and cons of nano string geomx digital spatial profiler

A
  • high-plex analysis, enabling simultaneous assessment of numerous targets
  • flexible in terms of target selection (RNA and protein)
  • compatible with standard FFPE samples
  • lower spatial resolution compared to other platforms
  • dependency on predefined probes (limited novel transcript discoveries)
54
Q

what are the pros and cons of SLIDE-SEQ?

A
  • high spatial resolution
  • allows for discovery of novel spatial biomarkers
  • technically challenging and requires special equipment
  • lower throughput, limits parallel sample processing
55
Q

what are the pros and cons of SEQ-SCOPE

A
  • exceptionally high spatial resolution
  • still in developmental stages, potentially high cost and technical complexity
56
Q

what are the pros and cons of MERFISH

A
  • extremely high-plex capacity
  • high spatial resolution
  • requires specialized and expensive equipment
  • complex data analysis pipeline
57
Q

what are the common limitations of single cell spatial transcriptomics?

A

trade off between spatial resolution and throughput

58
Q

what are the types of single cell DNA-seq?

A
  1. single cell Whole Genome Sequencing (scWGS)
  2. single cell Copy Number Variation (CNV) profiling
  3. single cell Whole Exam Sequencing (scWES) and single cell targeted DNA sequencing
59
Q

what are the challenges associated with scDNA-seq

A
  • higher technical noise compared to scRNA-seq
  • need for high sequencing depth to detect rare mutations
  • the potential for DNA amplification biases
  • cost-efficiency
  • currently rare, not a lot of data for reference/comparison
60
Q

what does scATAC-seq stand for and what is it used for?

A

single cell Assay for Transposase-Accessible Chromatic using sequencing; surveys the physical structure of the genome by identifying regions of open chromatin

61
Q

what is the goal of single cell immune profiling and what is measured?

A

comprehensive characterization of immune cells

  • gene expression
  • surface proteins
  • cytokines
  • functional states
62
Q

what does the 10X genomics single cell immune profiling solution provides?

A
  • 5’ transcriptome gene expression
  • T and B cell repertoire
  • antigen specificity
63
Q

what is CITE-seq and what are its uses?

A

Cellular Indexing of Transcriptomics and Epitopes by sequencing; determines the interaction between different immune cell groups and identification of novel distinct immune cell subsets in health and disease

64
Q

what is a common limitation across all platforms?

A

the trade off between spatial resolution and throughput

65
Q

why is it difficult to pick a superior platform for spatial transcriptomics?

A

depends on research question, tissue type, and available resources

66
Q

what are the biggest challenges of sc-DNA seq?

A
  • high technical noise
  • high cost
  • potential for DNA amplification bias
67
Q

what can sc-DNA be used to study?

A
  • tumor heterogeneity (in terms of mutations)
  • hematology
  • gene editing
68
Q

what other method is ATAC-seq a proxy to?

A

scRNA-seq

69
Q

t/f ATAC-seq cannot be used to identify cell types

A

false

70
Q

t/f scRNA-seq data is not zero-inflated relative to the sequencing depth

A

true

71
Q

what are some confounding factors of scRNA-seq

A
  • large volume of data
  • low depth of sequencing per cell
  • biological variability across cells/samples
  • technical variability across cells/samples
72
Q

what is the danger of scRNA-seq having a low depth of sequencing per cell?

A

a zero count can either mean the gene is not expressed or that the transcript was not detected (false negative)

73
Q

what are uninteresting sources of biological variation in scRNA-seq (unless the study specifically is testing the variation)?

A
  • transcriptional bursting
  • varying rates of RNA processing
  • continuous or discrete cell identities
  • environmental stimuli
  • temporal changes
74
Q

what are sources of technical variation in scRNA-seq?

A
  • cell-specific capture efficiency
  • library quality
  • amplification bias (drop out)
  • batch effects
  • dilution factor
75
Q

what are factors that contribute to batch effects?

A
  • RNA isolation not performed on the same day
  • library prep not performed on the same day
  • different people performing RNA isolation/library prep for all samples
  • not using same reagents for all samples
  • RNA isolation/library prep not performed at same location
76
Q

how can you combat batch effects?

A
  • split replicates of different sample groups across batches
  • include batch info in experimental metadata
77
Q

what method could you use to remove doublets from your data?

A

DoubletDecon

78
Q

what kinds of quality filtering are performed on scRNA-seq data

A
  • filter out cells based on mitochondrial reads (%)
  • filter out cells with too few or too many reads
  • filter out cells based on n features (genes) (too few or too many)
  • filter out genes based on expression across the cells
  • integrate and remove batch effects
79
Q

what is the difference between CITE-seq and immune cell profiling?

A

CITE-seq integrates scRNA-seq with simultaneous protein-level data, enabling characterization of both transcriptomes and cell surface protein markers from single cells while immune cell profiling may include 5’-end transcript sequences, offering insights into transcriptional initiation patterns specific to immune cells without direct protein-level measurements

80
Q

how does paired end sequencing improve mapping? what is it particularly useful for

A

because we know the approximate distance between the two reads. this is especially helpful with indels

81
Q

what is splice-aware alignment?

A

find the genomics coordinates of the sequencing reads considering that RNA undergoes splicing (prioritize mapping in non-intronic regions)

82
Q

what do you need for a splice-aware alignment?

A
  • data (raw sequencing reads)
  • high performance computing platform
  • software
  • reference genome sequence in FASTA format
  • exotic/intronic genome coordinates or gene annotation file
83
Q

what is a reference genome?

A

a digital nucleic acid sequence database assembles by scientists as a representative example of the set of genes in one idealized individual organism of a species (do not accurately represent the set of genes of any single individual organism)

84
Q

what is a gene annotation file?

A

a description of where genetic elements (intron, exon, transcript, gene) are located in the genome, in the form begin and end coordinate

85
Q

can you align RNA without a gene annotation file?

A

yes- RNA alignments that do not use gene annotation exist (some are called de novo aligners)

86
Q

can you align RNA without a reference genome?

A

yes- RNA alignments can use. target transcriptome as a multi-FASTA file

87
Q

what software can perform splice-aware alignment?

A

STAR, Hisat2, BBmap

88
Q

what software can perform DNA (no splice-aware) alignment?

A

BWA, Bowtie2

89
Q

what are the two steps of STAR alignment?

A
  1. seed searching
  2. clustering, stitching, and scoring
90
Q

what essential features are involved in scRNA-seq preprocessing compared to bulk-RNA-seq? how are these features achieved?

A
  • call calling
  • removing PCR duplicates
  • assigning reads to individual genes and cells

achieved through barcode and UMI sequences

91
Q

what are the outputs of scRNA-seq alignment for SmartSeq2 and 10X?

A

SmartSeq2- each cell has its own .bam

10X- 1 combines .bam and barcodes.tsv, features.tsv, matrix.mtx

92
Q

what is the scRNA-seq workflow?

A
  1. Process data (on a server/cloud) and obtain GE per cell values (small size manageable outputs)
  2. Filter out genes
  3. Filter cells
  4. Normalize expression values
  5. Identify highly variable genes
  6. Scale data, regress out unwanted variation
  7. Reduce dimensions
  8. Determine significant principal components
  9. Use the PCs to cluster cells with graph-based clustering
  10. Visualize clusters with no linear dimensional reduction (tSNE or UMAP)
  11. Detect and visualize marker genes for the clusters
  12. Classify the cells by cell type
93
Q

t/f you should filter out genes that are expressed in any cells or in only a few of them

A

t- removing them makes the data smaller and computations faster

94
Q

what factors contribute to the noise of single cell gene expression?

A
  • low mRNA content in a cell
  • variable mRNA capture
  • variable sequencing depth
95
Q

how would you perform global scale normalization?

A
  • divide gene’s UMI count in a cell by the total number of UMIs in that cell
  • multiply the ratio by a scale factor (10,000 by default)
  • transform the results by taking natural log
96
Q

in what case does global scale normalization not work well and what can be done instead?

A

high expressing genes; use SCTransform instead

97
Q

what are the steps in SCTransform?

A
  • modeling of gene expression data
  • normalization and variance stabilization
  • feature selection
  • mitigation of batch effects
  • scalability
98
Q

does scRNA-seq data have a weak or strong mean-varaiance relationship?

A

strong, low expressing genes have higher variance

99
Q

how would you perform Variance Stabilizing Transformation (VST)?

A
  • compute the mean and variance of each gene using the unnormalized UMI counts
  • take log10 of mean and variance
  • fit curve to predict the variance of each gene as a function of its mean expression
  • standardize count
  • for each gene, compute the variance of the standardized values across all cells
  • rank the genes based on standardized variance and use the top 2000 for PCA and clustering
100
Q

why do we need to scale data prior to PCA?

A

gives equal weight in downstream analyses so the highly expressed genes do not dominate

101
Q

how is scaling expression values prior to dimensional reduction done?

A

Z score normalization in Seurat’s ScaleData function:
- shifts the expression of each gene so that the mean expression across cells is 0
- scales the expression of each gene so that the variance across cells is 1

102
Q

how can we remove unwanted sources of variation from expression values prior to dimensional reduction?

A

Seurat constructs linear models to predict gene expression based on user-defined variables

103
Q

what are the steps in cell cycle phase regression?

A
  1. compute cell cycle scores for each gene based on its expression of G2/M and S phase markers
  2. model each gene’s relationship between expression and the cell cycle score
  3. regress: 2 options
    - remove ALL signals assoc with cell cycle stage
    - remove the difference between G2M and S phase scores (preserves signals for non-cycling vs cycling genes, only differences in cell cycle phase amongst the dividing cells are removes. useful when studying differentiating processes)