Graphical models Flashcards
Describe the “Probabilistic Pipeline”
Build model -> Discovere patterns -> Predict and Explore -> Critisise Model -> Build model
What is probabilistic inference?
Given latent variables and observed variables, propabilistic inference is learning about the unknown latent variables trough the posterior distrubution
What can we do if the marginal likelihood is intractable?
We can approximate using e.g. sampling and markov chain monte carlo.
What kind of graphs are used for:
- Bayesiean networks
- Markov random fields
- Factor graphs
- Directed
- Undirected
- Bipartite
Why do we use graphical models?
- Simple way to vizualise
- Gives insights into properties of the model
- Can be used to design/ motivate new models
- Can be used to express complex computations
What type of nodes do we have in Bayesian networks?
Shaded nodes: Observed random variables
Unshaded nodes: Latent random variables
What kind of depenedens do we have in a Bayesian network if and arrow is drawn from node a to b
p(b|a)
For Bayesian graphs, split into three sets, A,B,C, when can we say that A is conditional independent of B given C?
If all paths from A to B are blocked. A path is blocked if:
1 The arrows meet tail to tail or head to tail at a node and the node is in C
2. The arrows meet head to head at a node and neither the node nore it’s decendands are in C
In a Markov Random field, how can we se if two variables are conditional independent given the rest of the variables?
They arn’t connected directly in the graph
What is a clique in Markov random fields?
A fully connected subgraph
What is the formula for factorizing the joint distrubution in Markov Random fields?
p(x) = (1/Z) prod_C phi_C(x_C)
Z is the normalizing constant
C are the maximum liques
phi is the clique potential
What is the formula for the normalizing constant Z in markov random fields, and what is the complexity of calculating it?
sum_x prod_C phi_C(x_C) = Z
If the model has M nodes and each node has K states the complexity is M^K
When do the clique potentials have a probabilistic interpretation?
If we convert a directed grap to a MRF.
What is the problem factorizing using MRF for continues models?
We have to solve an integral for the normalizing constant Z, which is often intractable
How can we see which variables are conditional independent in MRF?
For sets A, B, C, A and B are conditional independent if all paths from A to B go trough C.
How is the clique potential related to energy functions?
phi_C(x_C) = exp(-E(x_c)) for some energy function E.
What is the total energy regarding clique potentials?
total_energy = -log(p(x)) = sum -log (phi_C(x_C))
Describe the MRF energy minimization model for image denoising
Let y_i be the observed variables and y_i be the latent variables.
E(yi, xi) = - axiyi
E(xi, xj) = -bxixj for neighbour pixels
E(xi) = cxi Bias term
E(x) = -asum(xiyi) - bsum(xixj) + csum(x_i)
Initialize xi to yi, calculate energy set xi to {-1, +1} minimizing the energy
What does a link between a variable node and a factor node indicate in a factor graph?
That the factor depends on the variable
How do we convert a Directed graph to a MRF?
- Add undirected links between all pairs of parents for all nodes and consider the rest of the links undirected
- Identify maximum cliques
- Initialize clique potential to 1
- Multiply each conditional distribution factor in the directed graph into one of the clique potentials
Can a directed graph have ore than one representation as a factor graph?
Yes
How can we create a factor graph from a MRF
Add a factor node for each clique
What is one advantage of factor graphs over MRF graphs?
Can remove cycles
What is the goal of the message passing algorithm?
Finding the (marginal) posterior distrubution
How do we pass messages from variable to factor nodes?
Send the product of all incoming messages
How do we send a message from factor nodes to variable nodes
marginalize out all incoming variables from
f(x1, x2, …, xM) prod incoming messages
How do we initialize leaf nodes in factor graphs?
If the node is a variable, initialize to 1, if it is a factor initialize to f(x)
How can we use observed results in the message passing algorithm to say something about the posterior?
If v_hat are the observerd variables, h are the unobserved variables and x are all variables, we have:
p(x)p(v=v_hat) = p(h,v=v_hat) prop. to p(h|v=v_hat)
Name some applications of inferenec in graphical models
Ranking, Computer Vision, Coding Theory, Linear Algebra, Signal Processing
If xi, xj are conditionally independent given everything else in a MRF (They don’t have a link) can they appear in a common factor in the factorization?
No