Introduction to pathway enrichment analysis Flashcards
What happens once you receive a “hit” from a screen?
- Genomics and proteomics study
- Gather information about their enrichment in known pathways, complexes, functions
Explain
Pathway and Network Analysis
- Any analysis involving pathway or network information
- Most commonly applied to interpret gene lists
- Most popular is pathway enrichment analysis
- Helps gain mechanistic insight into omics data
Define
Pathway
- Detailed, high-confidence consensus
- Biochemical reactions
- Small-scale, fewer genes
- Made from decades of literature
Define
Network
- Simplified cellular logic; noisy
- Large-scale, genome wide
- Constructed from omics data integration
What are the benefits of pathway data (vs. transcripts, proteins, SNPs)?
- Improves statistical power (fewer tests)
- More reproducible (gene expression signatures)
- Easier to interpret
- Identifies mechanisms
- Predicts new roles to genes by association
What is the workflow of pathway analysis?
- Collect genomics data
- Normalize and score (compute differential expression)
- Generate gene list
- Learn about underlying cellular mechanism using pathway and network analysis
Define
Gene lists
- Biological system: complex, pathway, physical interactors
- Similar gene function
- Similar cell or tissue location
- Chromosomal location
How to identify genes and proteins?
- IDs are unique, stable names or numbers
- Important to recognize the correct record type (gene, protein, RNA)
What are the main uses of identifier mapping?
- Searching for a gene name
- Link to related resources
- Identifier translation (proteins to genes)
- Merging data from different sources
What are some other annotations that can be added?
- GO terms for molecular function, cell location
- Chromosome position
- Disease association
- DNA, transcript, protein properties
- Interactions with other genes
Describe
Annotation sources
- Manual annotation by scientists (high quality, time-consuming)
- Electronic annotation (variable accuracy, lower quality)
What are the types of pathway/network analysis?
- Enrichment of fixed gene sets (what biological processes are altered in this cancer)
- De novo sub-network construction/clustering (are new pathways altered in this cancer)
- Pathway-based modeling (how are pathway activities altered in a particular patient)
Advantages/Disadvantages
Enrichment of fixed gene sets
- Easy to perform, good tools, good statistical model
- Many possible gene sets, lots of overlap
- Bags of genes that obscure regulatory relationships
MOST POPULAR
Explain the two-class design for gene lists
Expression matrix (Cases vs Controls) either becomes
* Genes ranked by differential statistic
* Selection by threshold
How to perform a gene list enrichment test?
- Defing gene list and background list
- Select gene sets (pathways) to test for enrichment
- Run enrichment tests (w/ Bonferroni/Benjamini-Hochberg corrections)
- Interpret enrichments
Describe
Outline of theory component
- Hypergeometric test for calculating enrichment P-values for gene lists
- Minimum hypergeometric test for computing enrichment P-values for ranked lists
Why test enrichment in ranked lists?
Possible problems with gene list test
* No natural value for threshold
* Different results at different threshold setting
* Possible loss of statistical power due to thresholding
Describe
Minimum hypergeometric test
- Calculate p-value at multiple thresholds
- Correct for multiple testing
Describe
Bonferroni correction
Corrected P-value = M x original P-value
Very stringent, invalidates real enrichments (leads to false negative)
M = number of annotations tested
Define
FDR
False Discovery Rate
* Expected proportion of the observed enrichments due to random chance
* Calculated using Benjamini-Hochberg correction
* Often called q-value
Define
Benjamini-Hochberg
Adjusted P-value = P-value x (# tests)/rank
Define
Enrichment map
- Network-based visualization of pathway enrichment analysis
- Nodes: gene sets reflecting pathways, processes
- Edges: sets sharing many common genes
Describe
De Novo Subnetwork Constructions and Clustering
- Apply list of altered elements to biological network
- Identify new configurations
- Extract clusters from new configs; annotate them
Define
Pathway-based modeling
- Apply list of altered elements to biological pathways
- Use detailed interactions in pathways