Kursusgang 11 (Graphical models) Flashcards
What is a graphical model?
A graphical model is a probabilistic model that represents the relationships between random variables using a graph, where nodes are random variables and edges are probabilistic relationships. The joint distribution over all of the random variables is decomposed into a product of factors each depending only on a subset of the variables.
Classes of graph:
* Bayesian networks, aka directed graphical models, expressing causal relationships.
* Markov random fields, aka undirected graphical models, expressing soft constraints.
* Factor graph, converted from directed graphical models and undirected graphical models to solve inference problems.
What is a Bayesian network?
A Bayesian network is a type of probabilistic graphical model that uses a directed acyclic graph (DAG) to represent a set of random variables and their conditional dependencies. It provides a compact and efficient way to model the joint probability distribution of a system.
p(X1, . . . , Xn) = \prod_{i=1}^n p(X_i | X_{pa(i)}),
where pa(i) are the parents of node i.
A node is conditionally independent of its ancestors given its parents.
Why are Bayesian networks useful?
Graph structure supports:
* Modular representation of knowledge.
* Local, distributed algorithms for inference and learning
* Intuitive (possibly causal) interpretation.
Factored representation may have exponentially fewer parameters than full joint P(X1, …, Xn) =>
* lower sample complexity (less data for learning)
* lower time complexity (less time for inference)
What is a Markov random field?
A Markov Random Field consists of:
Random Variables: Nodes in the graph represent the random variables.
Undirected Graph: Edges represent the direct dependencies or relationships between variables.
Local Markov Property: A node is conditionally independent of all other nodes given its neighbors.
The joint probability distribution of the variables is expressed using factors defined over cliques (fully connected subsets of nodes) in the graph.
What is a clique in a Markov random field?
A fully connected subset of nodes. Denote the set of variables in a clique by x_C.
What is a maximal clique in a Markov random field?
A maximal clique is a clique such that it is not possible to include any other nodes from the graph in the set without it ceasing to be a clique.
How is the joint distribution written in a Markov random field?
The joint distribution can be written as a product of potential functions ψ_C(x_C) over the maximal cliques of the graph,
p(x) = 1/Z * \prod_C ψ_C(x_C), where
Z is the partition function defined as
Z = \sum_x \prod_C ψ_C(x_C).
considering only potential functions which satisfy ψ_C(x_C) ≥ 0, p(x) ≥ 0.
Potential functions are arbitrary, nonnegative functions, usually given in terms of exponentials, more specifically Gibbs/Boltzmann distributions, as a function of the energy E
ψ_C(x_C) = exp(-E(x_C))
Large energy means low probability.
Small energy means high probability.
What is a simple example of an application of a Markov random field?
De-noising a (slightly) noisy image.
Let the observed noisy image be described by an array of binary pixel values y_i ∈ {−1, +1}, where the index i = 1, . . . , D runs over all pixels. We shall suppose that the image is obtained by taking an unknown noise-free image, described by binary pixel values x_i ∈ {−1, +1} and randomly flipping the sign of pixels with some small probability. Assume the bitflipping happens with a 10% chance.
Then we choose a very simple energy function, such as -η * x_i * y_i, where η is a positive constant. The remaining cliques comprise pairs of variables {x_i, x_j } where i and j are indices of neighbouring pixels. Again, we want the energy to be lower when the pixels have the same sign than when they have the opposite sign, and so we choose an energy given by −β * x_i * x_j where β is a positive constant.
Because a potential function is an arbitrary, nonnegative function over a maximal clique, we can just add more energies. In this example, this allows us to add an extra term h * x_i for each i in the noise-free image. Such a term has the effect of biasing the model towards pixel values that have one particular sign in preference to the other.
The complete energy function for the model then takes the form
E(x, y) = h \sum_i x_i − β \sum_{i,j} x_i * x_j − η \sum_i x_i * y_i
which defines a joint distribution over x and y.
For the purposes of image restoration, we wish to find an image, x, having a high probability (ideally the maximum probability).
How can a directed graph be turned into an undirected graph?
By moralization, which is the act of adding extra links between
all pairs of parents of a node, and then dropping the arrows.
What is a Markov blanket?
Bayesian:
The Markov blanket includes a node’s parents, children and the other parents of all of its children.
Markov random field:
Anything a node is connected to.
What is a graphical model tree?
In the case of an undirected graph, a tree is defined as a graph in which there is one, and only one, path between any pair of nodes. Such graphs therefore do not have loops.
In the case of directed graphs, a tree is defined such that there is a single node, called the root, which has no parents, and all other nodes have one parent. If we convert a directed tree into an undirected graph, we see that the moralization step will not add any links as all nodes have at most one parent, and as a consequence the corresponding moralized graph will be an undirected tree.
What is a factor graph?
It is a bipartite graph with two kinds of node: variables and factors. There is one node for each variable and another type of node for factors. Draw an edge between a variable and a factor if the variable appears in the factor.
How are undirected graphs converted to factor graphs?
Create variable nodes corresponding to the nodes in the original undirected graph, and then create additional factor nodes corresponding to the maximal cliques x_s. The factors f_s(x_s) are then set equal to the clique potentials. Note that there may be several different factor graphs that correspond to the same undirected graph.
How are directed graphs converted to factor graphs?
Create variable nodes in the factor graph corresponding to the nodes of the directed graph, and then create factor nodes corresponding to the conditional distributions, and then finally add the appropriate links. Note that there can be multiple factor graphs all of which correspond to the same directed graph.
What is the sum-product algorithm?
The sum-product algorithm is an efficient message-passing algorithm used to perform inference on graphical models. It computes marginal probabilities or marginal functions of variables by propagating “messages” between nodes in a graph.