Introduction to pathway enrichment analysis Flashcards

1
Q

What happens once you receive a “hit” from a screen?

A
  • Genomics and proteomics study
  • Gather information about their enrichment in known pathways, complexes, functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain

Pathway and Network Analysis

A
  • Any analysis involving pathway or network information
  • Most commonly applied to interpret gene lists
  • Most popular is pathway enrichment analysis
  • Helps gain mechanistic insight into omics data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Pathway

A
  • Detailed, high-confidence consensus
  • Biochemical reactions
  • Small-scale, fewer genes
  • Made from decades of literature
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define

Network

A
  • Simplified cellular logic; noisy
  • Large-scale, genome wide
  • Constructed from omics data integration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the benefits of pathway data (vs. transcripts, proteins, SNPs)?

A
  • Improves statistical power (fewer tests)
  • More reproducible (gene expression signatures)
  • Easier to interpret
  • Identifies mechanisms
  • Predicts new roles to genes by association
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the workflow of pathway analysis?

A
  • Collect genomics data
  • Normalize and score (compute differential expression)
  • Generate gene list
  • Learn about underlying cellular mechanism using pathway and network analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define

Gene lists

A
  • Biological system: complex, pathway, physical interactors
  • Similar gene function
  • Similar cell or tissue location
  • Chromosomal location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to identify genes and proteins?

A
  • IDs are unique, stable names or numbers
  • Important to recognize the correct record type (gene, protein, RNA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main uses of identifier mapping?

A
  • Searching for a gene name
  • Link to related resources
  • Identifier translation (proteins to genes)
  • Merging data from different sources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some other annotations that can be added?

A
  • GO terms for molecular function, cell location
  • Chromosome position
  • Disease association
  • DNA, transcript, protein properties
  • Interactions with other genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe

Annotation sources

A
  • Manual annotation by scientists (high quality, time-consuming)
  • Electronic annotation (variable accuracy, lower quality)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the types of pathway/network analysis?

A
  • Enrichment of fixed gene sets (what biological processes are altered in this cancer)
  • De novo sub-network construction/clustering (are new pathways altered in this cancer)
  • Pathway-based modeling (how are pathway activities altered in a particular patient)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advantages/Disadvantages

Enrichment of fixed gene sets

A
  • Easy to perform, good tools, good statistical model
  • Many possible gene sets, lots of overlap
  • Bags of genes that obscure regulatory relationships

MOST POPULAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the two-class design for gene lists

A

Expression matrix (Cases vs Controls) either becomes
* Genes ranked by differential statistic
* Selection by threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to perform a gene list enrichment test?

A
  • Defing gene list and background list
  • Select gene sets (pathways) to test for enrichment
  • Run enrichment tests (w/ Bonferroni/Benjamini-Hochberg corrections)
  • Interpret enrichments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe

Outline of theory component

A
  • Hypergeometric test for calculating enrichment P-values for gene lists
  • Minimum hypergeometric test for computing enrichment P-values for ranked lists
17
Q

Why test enrichment in ranked lists?

A

Possible problems with gene list test
* No natural value for threshold
* Different results at different threshold setting
* Possible loss of statistical power due to thresholding

18
Q

Describe

Minimum hypergeometric test

A
  1. Calculate p-value at multiple thresholds
  2. Correct for multiple testing
19
Q

Describe

Bonferroni correction

A

Corrected P-value = M x original P-value

Very stringent, invalidates real enrichments (leads to false negative)

M = number of annotations tested

20
Q

Define

FDR

A

False Discovery Rate
* Expected proportion of the observed enrichments due to random chance
* Calculated using Benjamini-Hochberg correction
* Often called q-value

21
Q

Define

Benjamini-Hochberg

A

Adjusted P-value = P-value x (# tests)/rank

22
Q

Define

Enrichment map

A
  • Network-based visualization of pathway enrichment analysis
  • Nodes: gene sets reflecting pathways, processes
  • Edges: sets sharing many common genes
23
Q

Describe

De Novo Subnetwork Constructions and Clustering

A
  • Apply list of altered elements to biological network
  • Identify new configurations
  • Extract clusters from new configs; annotate them
24
Q

Define

Pathway-based modeling

A
  • Apply list of altered elements to biological pathways
  • Use detailed interactions in pathways