Omics experiments and algorithms Flashcards
Week 8 Lecture 3
1
Q
Timeseries experiments
A
- Take a cell or tissue sample
- Apply some change to the environment
- Take n samples at given time points
- Measure each sample
- Analyse how things changed over time
2
Q
Cell type experiments
A
- Take one or more tissue samples
- Extract similar cells by morphology or fluorescent tagging
- Measure each cell group (proteome, metabolome, transcriptome)
- Analyse how things are different between cells
3
Q
Spatial analysis experiments
A
- Take a tissue sample
- Either carry out in-situ hybridisation to probes or microdissection followed by sequencing/hybridisation
- Measure each sample
- Categorise where the same came from
4
Q
Applications of dendrograms
A
- Phylogeny
- Clustering biological entities
5
Q
How do we measure distance?
A
- Number of substitutions
- Estimate the distance given observed differences and apply a nucleotide/amino acid substitution model
- Euclidean/hamming/cosine distance between feature vectors
6
Q
Distance matrix
A
- An all-against-all matrix which catalogues all scores and measures how far apart all pairs of entities are. All scores on the diagonal must be zero.
- A distance measure can be used as is
- A similarity measure must be inverted in some way
7
Q
Tree clustering algorithms
A
- Distance-based (UPGMA)
- Maximum parsimony trees
- Maximum likelihood trees
8
Q
What is UPGMA?
A
- Unweighted Pair-Group Method with Arithmetic mean
- Unweighted: All pairwise distances contribute equally
- Pair-group: Groups are combined in pairs
- Arithmetic mean: Pairwise distances between groups are means to all group members
9
Q
How does UPGMA work?
A
- Form a cluster for each leaf node
- Find the 2 closest clusters given the average distance between those clusters
- Merge C1, C2 into a single cluster C
- Form a node for C, connecting it to C1 and C2. Set the age of C as Davg(C1,C2)/2
- Eliminate columns for C1 and C2 in D, add a row/column for C and compute the average distances between clusters once again
- Iterate steps 2-5 until you reach a single cluster containing all clusters
10
Q
UPGMA properties
A
- Time complexity O(n^2 logn)
- A unique tree
- A rooted tree
- An ultrametric tree (all the leaves are equidistant from the root)
11
Q
Node
A
A vertex which represents an entity that we wish to model that can have a defined relationship with other nodes
12
Q
Edge
A
A connection between two nodes that specifies some relationship between them
13
Q
Adjacency
A
Two nodes are adjacent if connected by an edge
14
Q
Typical experimental design
A
- Time-series transcriptomics
- Data pre-processing
- Inference methods
- Network inference
- Validation
- Modelling/simulations
15
Q
CLR algorithm
A
- Take all transcription data
- Calculate mutual information between expression levels of all pairs of genes
- Build MI matrix
- Calculate the z-score for each putative transcription factor and putative target
- Calculate joint z score
- Accept any zi,j that is above a given threshold indicating regulation