Graphical Models Flashcards
D-Separation (DAG) notion (Conditional Independence)
[Bayesian Networks]
blocked path => conditionally independence
If all paths are blocked, then A is d-separated (conditionally indep.) from B by C, and the joint distribution satisfies A independent of B|C.
D-Separation (DAG) (Conditional Independence)
[Bayesian Networks]
A path is blocked if it includes a node such that either
(1) The arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C (observed) or
(2) The arrows meet head-to-head at the node, and neither the node nor any of its descendants is in the set C (observed)
Definition: Cliques
Fully connected subgraphs, i.e. all nodes are connected with each other
Factorisation of the join distribution.
Define factors in the decomposition of the joint distribution
[Markov random fields]
Define factors in the decomposition of the joint to be functions of the variables in (maximum) cliques:
p(x) proportional to product of all max cliques psi_c(x_c)
Factorisation of the joint distribution (exact)
[Markov random fields]
p(x) = 1/Z product of all max cliques psi_c(x_c),
where Z = sum_x prod_c psi_c(x_c),
psi_c(x_c) = clique potential
Conditional Independence
[Markov random fields]
(1) A conditionally independent of B|C iff all paths form A to B pass through C. (Then, all paths are blocked) or
(2) Remove all nodes in C from the graph. If there is a path from A to B then A conditionally independent of B|C does not hold.
Definition (Potentials as Energy Functions)
[Markov random fields]
psi_c(x_c) := exp(-E(x_c))
for some energy function E
Definition (Energy function: total energy)
[Markov random fields]
E(x,y) = -nu sum_i x_i y_i - beta sum_ij x_i x_j + gamma sum_i x_i
term1 = latent-observed (strong correlation between observed & latent variables) term2 = latent-latent (smoothness prior) term3 = bias (places a prior on the latent pixel values)
Iterated Conditional Modes (ICM) Algorithm
[Markov random fields]
(1) Initialise all x_i = y_1
(2) Pick any x_j and evaluate the total energy
(2a) E(x_-j U +1, y)
(2b) E(x_-j U -1, y)
(3) Set x_j to whichever state (+-1) has the lower energy
(4) Repeat
Joint distribution
[Factor graphs]
The joint distribution is a product of factors:
p(x) = prod_s f_s(x_s)
x_s = subset of variables f_s = factor; non-negative function of the variables x_s
Directed -> MRF
[ Converting graphs]
(1) Moralisation
(1a) Add additional undirected links between all pairs of parents for each node in the graph
(1b) Drop arrows on original links
(2) Identify (maximum) cliques
(3) Initialise all clique potentials to 1
(4) Take each conditional distribution factor in the directed graph, multiply it into one of the clique potentials
Note: If a marginal distribution occurs in both cliques we get to choose where it goes!
MRF -> Factor Graph
[ Converting graphs]
(1) Take variable nodes from MRF
(2) Create additional factor nodes corresponding to the maximal cliques x_s
(3) The factors f_s(x_s) equal the clique potentials
(4) Add appropriate links
Important: Multiple factor graphs may correspond to the same undirected graph
Directed Graphical Model -> Factor graph
[ Converting graphs]
(1) Take variable nodes from Bayesian network
(2) Create additional factor nodes corresponding to the conditional distributions
(3) Add appropriate links
Important: Not unique
Variable-to-factor message
[Exact inference in factor graphs]
mu_{x_m -> f_s}(x_m) = prod_{all f_l \ f_s} mu_{f_l -> x_m}(x_m)
Factor-to-variable message
[Exact inference in factor graphs]
mu_{f_s -> x}(x) = sum_{x_i} f_s(x, x_1, …, x_M) prod_{all x_m \ x} mu_{x_m -> f_s}(x_m)