Bioinformatics for Stratified Medicine Flashcards

Question

What are two ways in which biological pathways can be represented?

Answer 1

1. Pathways as collections of genes (gene sets) | 2. The same gene can belong to multiple pathways

Answer 2

One way of inferring biological meaning from lists of genes by looking at overlaps with known gene sets

Answer 3

1. Perturbations at the single gene level might not explain the whole picture - A single gene mutation might not be enough to perturb an entire pathway because of redundancy - Mutations of opposite effect might compensate each other so that the outcome of pathway is not disrupted even if some of its genes are 2. Shift to a pathway-centered view of biological systems 3. Can be used to generate hypotheses about the phenomenon studied that can be taken forward for further evaluation

Answer 4

Given the overlap between a list of genes and a gene set of interest, what is the probability of obtaining the same or a greater overlap between the two by chance? Performing enrichment analysis

Answer 5

1. Experiment that generates a gene list 2. Meaningful categories (gene functions, pathways, etc.) - creates gene set 3. Association between genes and categories (annotation) - creates gene set 4. Methods to estimate which gene sets are significantly perturbed by an experiment (enrichment analysis)

Answer 6

Collection of many different resources (GO, pathways, published signatures, etc.) provided for purpose of testing a list of genes against gene sets coming from different sources at once

Answer 7

The same pathway can appear quite different according to the pathway resource used.

Answer 8

1. Overrepresentation analysis - Estimates the significance of the overlap between a list and a gene set - Choice of genes in the list based on an arbitrary threshold Fisher’s test 2. Ranked enrichment analysis - Genes are sorted according to a meaningful metric -No arbitrary threshold needed

Answer 9

A collection of statistical methods for estimating how enriched is a gene set in a list of genes of interest (are genes in the gene list coming from a gene set more frequently then what would be expected by chance?).

Answer 10

- List of genes generated by an experiment - In most cases genes are chosen according to a certain threshold P-value Fold Change

Answer 11

Because a large overlap could be due to a larger number of genes in the gene list of interest. If for example our gene list contained all known genes, the overlap between that gene list and any pathway would be very high but this would not be particularly meaningful.

Answer 12

- Is the overlap between my list of genes and the gene set I am testing bigger than what I would get by randomly selecting the same number of genes? - Usually computed using: Fisher’s Exact Test Size of the list Size of the gene set Size of the overlap Size of the universe (number of genes tested)

Answer 13

The universe ``` Universe= Genome Only genes present on the array Only expressed genes Only genes for which pathway annotation is available ``` If for example we are running a small scale transcriptomic experiment where only 100 genes are tested, no more than 100 genes could have come up as significant

Answer 14

Overrepresentation analysis requires a list of genes to compute significance. The threshold used to select these genes has an impact on the results of the enrichment

Answer 15

Takes as an input a ranked (sorted) list of genes P-value Fold change Other metrics Are the genes from a gene set overrepresented at the top of the list compared to randomly picked lists of genes of similar sizes? The enrichment is significant if the genes of the gene set are mainly located at the top of the ranked list The enrichment is not significant if the genes of the gene set are randomly scattered across the list

Answer 16

Gene Set Enrichment Analysis (GSEA) Enrichment Score calculated by screening the list top-down: - Increase statistic whenever gene in the ranked list belongs to gene set S - Decrease statistic whenever gene in the rank list belongs to gene set S - Score equals to the maximum deviation from zero

Answer 17

Another way of inferring biological meaning from gene lists looking at connections between genes - Establish links between genes - Analyse the structure of the network

Answer 18

Can be represented by a graph composed by nodes and edges. ``` Nodes can represent biological entities Genes Proteins Variants Metabolites ``` ``` Edges represent relationships between nodes Activation/Repression Physical interaction Binding Cleavage Co-expression ``` Edges can be directed as in the case of activation (one molecule activates another but not vice versa) or undirected - co-expression (if gene A is correlated with gene B then gene B is correlated with gene A).

Answer 19

Ingenuity 2. Commercial application for analysing complex omics datasets in a network framework Multiple modules available: - Pathway/Function/Disease analysis (Fisher’s test) - Network generation - Upstream regulators Knowledge based on curated published results

Answer 20

A link is created between two entities if there exists a published experiment that describes this relationship- The nature of the relationship is included in the network

Answer 21

Certain recurrent patterns of biological networks like the presence of hubs.

Answer 22

1. Supervised 2. Supervised multiclass pathway activity inference method: - For each pathway expression dataset, patterns of its constituent genes are summarised into pathway activity - Infer a feature as a weighted linear summation of expression of its constituent genes - Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes - Classification is then performed on the resulting low-dimensional pathway activity profile 3. Enrichment analyses, however typically rank based is used.

Answer 23

1. Comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls 2. establish differences in RNA expression patterns across all tissue types 3. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes

Answer 24

1. A human protein functional interaction (FI) network constructed by combining curated and uncurated data sources using a machine learning technique 2. Modules derived from a highly reliable gene functional interaction network 3. Infer a feature as a weighted linear summation of expression of its constituent genes 4. Assigning gene co-expression values as weights for the FI network, network modules were discovered containing genes having similar expression patterns in a disease, and used as features to model disease heterogeneity 5. Survival curves among the high and low module expression groups were derived, and acts as a proof of principle for using module 2 expression as a cross-platform prognostic signature.

Answer 25

Whole-genome gene expression profiling was performed on 42 biopsy samples (from SAKK 19/05 trial) using Affymetrix exon arrays, and associations with the following endpoints: time-to-progression (TTP) under therapy, tumor-shrinkage (TS), and overall survival (OS) were investigated. Gene set enrichment analyses (GSEA) was performed GSEA revealed a significant enrichment of the angiogenesis-associated genes within the genes that associate with TTP under BE therapy endpoint