Gene module Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q
  • Explain why RNAseq reads should be mapped with splicing-aware
    read mappers?
A

RNA-Seq reads need splicing-aware mappers because RNA comes from spliced transcripts where introns are removed, and some reads span exon-exon junctions. Regular mappers can’t handle these split reads, but splicing-aware tools (e.g., STAR, HISAT2) can align them correctly, ensuring accurate gene expression analysis and detection of splicing events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the RPKM/FPKM and DESeq2/VST techniques?

A

Normalization techniques for bulk RNA-seq
RPKM/FPKM (Reads/Fragments Per Kilobase of transcript per
Million mapped reads):
- Normalizes for gene length and sequencing depth
- RPKM (single-end reads), FPKM (paired-end reads)
TPM (Transcripts per million):
- Normalizes for gene length first, then sequencing depth
- Makes expression levels comparable across genes and samples

DESeq2/VST (Varianze stabilising transformation)
: normalizes count data and performs differential gene expression analysis using a negative binomial model. VST (Variance Stabilizing Transformation) is a technique within DESeq2 that stabilizes variance across genes, making the data more suitable for visualization and clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key metrics for QC (in bulk DNA analysis)?

A

Read Quality: A measure of the accuracy and reliability of sequencing reads, often represented as a Phred score indicating the probability of an error in each base call.

Adapter Content: The presence of adapter sequences (used in library preparation) within the sequencing reads, which can interfere with downstream analysis if not removed.

Sequence Length Distribution: A summary of the lengths of the sequencing reads, used to check for consistency and identify potential trimming or sequencing issues.

GC Content: The proportion of guanine (G) and cytosine (C) bases in the sequences, often analyzed for biases that may affect sequencing coverage or downstream analysis.
Behavioral module

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose and general idea of a Linear Mixed-Effects Model (LME)?

A

Purpose: Account for fixed and random effects
* Fixed effects: consistent and systematic across all observations (e.g.
treatment or condition)
* Random effects: batch effects, individual variability
* LME allows to control for confounding variables (random effects) while estimating impact of variables of interest (fixed effects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the issue with testing many genes and how can this be mitigated?

A
  • With thousands of genes, massive number of statistical tests performed
  • Some will be detected as differential purely by chance
  • Correction methods mitigate the risk of false positives, but increase the
    likelihood of false negatives (missing truly differentially expressed genes)

Multiple test correction
* Differential expression: many tests are performed
* Need to take this into account, e.g. using Benjamini–Hochberg
(BH) multiple testing correction
* BH adjusts the p-value based on the number of tests
* It controls the False Discovery Rate (FDR): among all genes called
significantly differentially expressed, which proportion is in reality
from the null model (i.e. not differentially expressed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Applications of PCA in RNA-seq)

A
  • Visualizing relationships between samples
  • Detecting outliers (problematic samples)
  • Identifying patterns (e.g. influence of treatment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are average linkage and complete linkage methods for and what is the difference between them?

A

methods used in hierarchical clustering to determine how clusters are formed by measuring the distance between groups of data points

Complete linkage uses maximal intercluster dissimilarity.
The largest of the pairwise dissimilarities is use

Average linkage uses mean intercluster dissimilarity.
The average of the pairwise dissimilarities is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Enrichment Analysis for and what are the steps?

A

statistical techniques used to identify whether specific biological categories (e.g., pathways, gene sets, or functional annotations) are overrepresented or “enriched” in a given list of genes, compared to what would be expected by chance.

Steps:
1. Input Gene List:
A set of genes of interest (e.g., differentially expressed genes, genes from a specific cluster, or genes with mutations).

  1. Reference Background:A larger set of genes representing the entire genome, transcriptome, or experimental dataset.
  2. Gene Annotations:Categories or functional terms, often from curated databases such as:
    * Gene Ontology (GO) terms (e.g., biological processes, cellular components, molecular functions).
    * Pathway databases
    * Disease databases

4.Statistical Testing:

Compares the overlap between the input gene list and annotated gene sets to assess overrepresentation.
Methods include:
   * Fisher's Exact Test or Hypergeometric Test: Determines whether the overlap is statistically significant.
  1. Multiple Testing Correction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What procedure is commonly used to reduce the FDR?

A

Benjamini-Hochberg (BH)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the benefits with single cell-approaches compared to Bulk RNA-seq?

A
  • bulk RNA-seq analyzes average gene expression: masks cell-to-cell variability
  • Single-Cell Approaches: capture heterogeneity
  • profiles gene expression at single-cell level
  • insights into (rare) cell types, cell states
  • dynamic processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

applications and workflow of Single-Cell RNA Sequencing (scRNA-seq) preprocessing
?

A
  • Identifying rare cell types
  • In bulk RNA-seq these would not be picked up
  • Understanding differentiation
  • Define “cell trajectories”
  • Disease progression

WORKFLOW scTNA
1. Cell dissociation and isolation (e.g., FACS, microfluidics)

  1. Cell barcoding and amplification
    * Amplification using PCR
    * Barcoding needed to distinguish the individual cells during data analysis:
    add a short nucleotide sequence to the mRNA
    * All the molecules from a single cell will have the same barcode
  2. after barcoding and amplification, the cells are pooled into one
    sequencing library
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Challenges in scRNA-seq

A
  1. single cell vs single Nucleus:
    * Some cells are harder to capture during dissociation:
    * Nuclei are more resistant to force: This makes it easier to isolate nuclei than whole cells in some cases.
    * Nuclei reflect transcriptional patterns: Transcription in the nucleus can approximate gene expression but may lack full context.
  2. Dropouts:
    * A phenomenon where a gene is expressed in one cell but not detected in another cell of the same type, due to low expression levels or technical issues.
    * Can complicate interpretation.
  3. Batch Effects:
    * Variations caused by technical differences between experiments (e.g., processing on different days or labs).
    * These differences may overshadow true biological variation, requiring normalization to remove non-biological effects.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is spatial transcriptomics and what is the applications of it?

A
  • Maps gene expression to tissue locations, preserving spatial context
    (Techniques: Slide-seq, Visium, MERFISH, stereo-seq)

Applications:
* Reveals spatial organization of tissues: map gene expression to brain anatomy
* Understanding cell-type diversity
* Interactions and cell-cell communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does single-cell ATAC-seq do and what insights can be gained from it?

A

Single-Cell ATAC-seq: Profiling Chromatin Accessibility

Purpose: Profiles chromatin accessibility at single-cell resolution to identify active regulatory regions (e.g., enhancers and promoters).
Insight: Reveals which regions of the genome are open and potentially regulating gene expression in specific cell types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Single-Cell DNA Methylation Sequencing do and what insights can be gained from it?

A

*Purpose: Profiles DNA methylation (an epigenetic modification) at single-cell resolution, using bisulfite sequencing.

  • Insight: Studies cell-to-cell variation in methylation, helping understand stable epigenetic regulation and its role in cell identity and developmen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain Multi-Omics at Single-Cell Resolution

A
  • Integrates multiple omics data from the same single cell
  • scNMT-seq (combines methylation, chromatin accessibility,
    transcriptomics)
  • Comprehensive understanding of cellular states and functions
17
Q

What are the QC metrics in single cell rna-seq?

A

Importance of QC: Crucial due to variability in cell quality

Metrics:
* Total reads per cell
* number of detected genes
* mitochondrial gene content

18
Q

What are doublets?

A

(Problem in single cell rna-seq)
* reads
originating from two cells are assigned to a single cell - Doublets can skew results
* Can be computationally removed.

19
Q

What are the unique challenges of SINGLE CELL rNA-seq in the Alignment and Quantification process? How are these addressed?

A
  1. Low Read Depth:
    - Each cell is sequenced at a shallower depth due to the high number of cells, resulting in fewer reads per cell.
  2. Dropout Events:
    - Genes with low expression might not be detected in some cells, leading to zero counts for genes that are actually expressed.
  3. High Technical Noise:
    - Variability caused by the amplification and sequencing process, rather than biological differences.

FIiltering: * Filtering: low-quality cells and genes are removed (e.g., low gene
counts, genes not expressed in enough cells)

20
Q

What are the steps in the single cell RNA-seq pipeline?

A
  1. QC, Alignment, Quantification, normalization
  2. Cell clustering
  3. Cell annotation
  4. differential expression
  5. trajectory inference
  6. multi-modal integration
21
Q

Normalization and Scaling in scRNA-seq (challenges and solutions)

A

Challenges:
* Zero Inflation: Excessive zero counts due to dropout events or technical issues.
* Variable Sequencing Depth: Uneven read counts between cells.

Solutions: * Imputation: Fills in missing values using statistical models (e.g., negative binomial). * Log-Normalization: Scales counts for sequencing depth and applies log transformation to stabilize variance.
22
Q

How do we evaluate clusters in single-cell RNA-seq?

A
  • Cluster Validation: Methods for evaluating cluster quality (e.g., silhouette scores, differential
    expression analysis).
  • Biological Interpretation:
    Associating clusters with cell types or states
23
Q

What is annotation

A

Annotating clusters involves linking them to cell types using marker genes, either manually or with automated tools like Garnett, based on differential expression analysis

24
Q

goal, methods and applications of pseudotime analysis

A

Goal: Arrange cells along a temporal trajectory based on their gene expression profiles, simulating a time order of cellular processes without actual time points.
Methods:
Clustering-Based Approach:
Group cells into clusters.
Connect clusters to form a trajectory, reflecting transitions between cell states.

Probabilistic Frameworks:
Calculate transition probabilities between cells or clusters.
Build trajectories by modeling the most likely paths cells follow.

Applications:
Study cell differentiation (e.g., stem cells becoming specialized).
Analyze developmental processes (e.g., organ formation).
Explore cell responses to stimuli (e.g., immune activation).

Summary: Pseudotime analysis reconstructs cellular transitions, revealing dynamic processes like differentiation or development from static single-cell data.

25
Q

How does advanced trajectory inference differ from pseudotime analysis?

A

Pseudotime Analysis: Simpler, primarily linear or unbranched paths.

Advanced Trajectory Inference:
* Extends pseudotime to include branching and more complex biological processes.
* Ideal for studies of cell fate decisions and differentiation.

26
Q

Approaches to dimensionality reduction

A
  • density/distribution based approaches: t-SNE and UMAP
  • autoencoders
  • approaches specifically developed for single cell data (e.g. trajectory
    inference)
27
Q

t-SNE vs UMAP in terms of focus

A

t-SNE: Focuses heavily on local clusters; great for visualizing small datasets but lacks meaningful global structure.
UMAP: Balances local and global relationships; faster and better suited for larger datasets while retaining interpretable structures.

28
Q

Explain the two most important steps of t-SNE and UMAP, and where they differ

A

Step 1: Construction of High-Dimensional Probability Distribution

t-SNE: * pairwise similarities using a Gaussian distribution. _ Bandwidth of the Gaussian is adjusted based on perplexity (a hyperparameter that controls the effective number of neighbors).
UMAP: Uses a graph-based approach: * Points are connected based on the overlap of radii. * The radius is chosen locally, based on the distance to the nth nearest neighbor (key hyperparameter: number of neighbors). * Ensures every point is connected to at least its closest neighbor for continuity.

Step 2: Mapping to Lower Dimensions

t-SNE:
* Minimizes Kullback-Leibler (KL) divergence to preserve local relationships, focusing on grouping nearby points.
UMAP:
    similar but minimizes cross-entropy loss, balancing local and global structure.
29
Q

Regularized AE

A

Put constraint in loss function to prevent the model from being too
complex (we dont want it to be overfitted/noisy) :
penalty on latent variables
denoising AE: loss compares original image with model applied to
corrupted version of the image
contractive AE (penalty on derivatives of nodes in hidden layer w.r.t.
input; will disregard small changes in input)

30
Q

Tensor factorization (+ it’s relation to PCA and denoising autoencoders)

A
  1. Tensor Factorization (for Imputation):What It Is: A method that decomposes high-dimensional data (e.g., gene expression matrices) into simpler factors (like a sum of smaller matrices or tensors).
    Purpose: Used to impute missing values (e.g., dropout events in single-cell data) by reconstructing the data from these factors.
    Relation to PCA and Denoising Autoencoders:
    Like PCA: Breaks data into components, identifying dominant patterns.
    Like Denoising Autoencoders: Learns latent representations to reconstruct noisy or incomplete data, enhancing signal clarity.
31
Q

PAGA

A

PAGA (Partition-Based Graph Abstraction):

What It Is: A method for trajectory inference in single-cell data that represents cells as nodes and their relationships as edges in a graph.
Key Idea: Simplifies trajectories by grouping similar cells into clusters (partitions) and modeling transitions between these clusters instead of individual cells.
Purpose: Captures both global and local cellular relationships, useful for branching or complex trajectories.
32
Q

RNA velocity

A

Exaple answer: Approach to predict future cell states based on splicing kinetics

CHat-gpt answer: RNA velocity is a computational method used in single-cell RNA-seq analysis to infer the direction and speed of gene expression changes within individual cells. It provides insights into the dynamic processes of cellular state transitions, such as differentiation or response to stimuli.