Unit 4: Graphical Models and Bayesian Networks Flashcards by Rayno Mostert

Major problem with decision trees, SVMs and NNs

How to handle uncertainty and unobserved variables that require probabilistic reasoning.

E.g. how can we deal with any uncertainty in our input variables, or what confidence do we have in the output.

How well did you know this?

Not at all

Perfectly

Undirected graphical models

A.k.a. Markov Random Fields (MRFs) or Markov Networks.

They represent the relationships between variables using undirected edges. The independence properties between the variables can be easily read off the graph through simple graph separation (missing edges / arcs).

How well did you know this?

Not at all

Perfectly

Directed graphical models

A.k.a. Bayesian network or Belief network (BNs).

They are directed acyclic graphs (DAGs).

They consists of a set of nodes, together with a set of directed edges.

Each directed edge is a link from one node to another with the direction being important.

The edges must not form any loops.

How well did you know this?

Not at all

Perfectly

Moralisation

A bayesian network can be converted into an MRF by connecting all the parents of a common child with edges and dropping the directions on the edges.

How well did you know this?

Not at all

Perfectly

Maximum a posteriori query
(MAP query)

“What is the most likely explanation for some evidence?”

I.e. what is the value of the node X that maximises the probability that we would have seen this evidence.

“What value of x maximises P(x | e)?”

How well did you know this?

Not at all

Perfectly

Bayesian networks

(a.k.a. Belief networks)

Provide a graphical representation of a domain that provides an intuitive way to model conditional independence dependencies and handle uncertainties.

Providing a compact specification of full joint probability distributions.

How well did you know this?

Not at all

Perfectly

3 Components of Bayesian networks

A finite set of variables.
A set of directed edges between the nodes, that form an acyclic graph.
A conditional probability distribution associated with each node P(Xᵢ | Parents( Xᵢ )). This is typically represented in a conditional probability table (CPT) indexed by values of the parent variables.

How well did you know this?

Not at all

Perfectly

An uninstantiated node in a Bayesian Network

A node whose value is unknown.

How well did you know this?

Not at all

Perfectly

Conditional independence

A and B are conditionally independent given C

if and only if

P( A⋂B | C ) = P(A|C) . P(B|C)

Denoted

( A ⟂ B ) | C

How well did you know this?

Not at all

Perfectly

D-separation

Provides that:

If two variables, A & B, are d-separated given some evidence e, they are conditionally independent given the evidence.

P(A | B, e) = p(A | e)

How well did you know this?

Not at all

Perfectly

Chain rule

The joint probability distribution is the product of all the individual probability distributions that are stored in the nodes of the graph.

P( X₁, …, Xₙ ) = ᴨ P( Xᵢ | Parents(Xᵢ) )

How well did you know this?

Not at all

Perfectly

Parents of node Xᵢ

The nodes that have edges into Xᵢ.

How well did you know this?

Not at all

Perfectly

Hybrid networks

Networks that contain both discrete and continuous variables.

How well did you know this?

Not at all

Perfectly

Clique

A clique of a graph is a subset of nodes of the graph such that every two distinct vertices in the clique are adjacent.

How well did you know this?

Not at all

Perfectly

Graph separation in an undirected graph

XA is conditional independant of XC given XB if there is no path from a ∈ A to c ∈ C after removing all variables in B.

How well did you know this?

Not at all

Perfectly

Markov blanket (undirected graph)

Study These Flashcards

The Markov blanket of a variable is precisely its neighbours in the graph.

Boltzmann Machine

Study These Flashcards

Boltzmann machines learn the probability density from the input data to generating new samples from the same distribution.

It is a fully connected graph with pairwise potentials on binary-valued nodes.

They have a hidden layer of nodes and a visible layer of nodes.

Restricted Boltzmann Machines

Study These Flashcards

A class of Boltzmann machine with a single hidden layer and a bipartite connection.

This means that every node in the visible layer is connected to every node in the hidden layer but the nodes in the same layer are not connected to each other.

Ising model

Study These Flashcards

A mathematical model of ferromagnetism in statistical mechanics invented by Wilhelm Lenz (1920), who gave it as a problem to his student Ernst Ising.

Hopfield network

Study These Flashcards

A fully connected Ising model with a symmetric weight matrix.

The main application of Hopfield networks is as an associative memory or content addressable memory.

We train the network on a set of fully observed bit vectors we wish to learn. The weights of the network and bias terms can be learned from the training data using (approximate) maximum likelihood.

Learning in fully observable Bayesian Networks

Known Structure

Study These Flashcards

To learn the CPTs of a Bayesian network, we require a dataset D where D is made up of a set of cases or samples.

Each sample is a vector of values, one for each variable in the model.

The task is to find the model M that maximises Pr(D | M) (probability of the data given the model).

Bayesian Correction

Study These Flashcards

A correction factor made by adding 1 to the count in the numerator and a value m to the denominator where m is the number of possible diferent values the variable can take on.

Inference by Stochastic Simulation

Study These Flashcards

Exact inference in a Bayesian network requires marginalising latent variables from the joint distribution.
Marginalisation becomes quickly intractable in large networks.
An alternative is to use approximate inference methods

Direct sampling

where we have an empty network with no evidence

Study These Flashcards

Sample parents
Sample children from conditional probabilities P( X |Parents(X) )
Continue until no variable is left

Main assumption made by direct sampling

The approach assumes no evidence is available in the network. It cannot be used to compute posterior probabilities. | (solution is to use Rejection sampling)

Unit 4: Graphical Models and Bayesian Networks Flashcards

(26 cards)