Unit 4: Graphical Models and Bayesian Networks Flashcards

1
Q

Major problem with decision trees, SVMs and NNs

A

How to handle uncertainty and unobserved variables that require probabilistic reasoning.

E.g. how can we deal with any uncertainty in our input variables, or what confidence do we have in the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Undirected graphical models

A

A.k.a. Markov Random Fields (MRFs) or Markov Networks.

They represent the relationships between variables using undirected edges. The independence properties between the variables can be easily read off the graph through simple graph separation (missing edges / arcs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Directed graphical models

A

A.k.a. Bayesian network or Belief network (BNs).

They are directed acyclic graphs (DAGs).

They consists of a set of nodes, together with a set of directed edges.

Each directed edge is a link from one node to another with the direction being important.

The edges must not form any loops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Moralisation

A

A bayesian network can be converted into an MRF by connecting all the parents of a common child with edges and dropping the directions on the edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Maximum a posteriori query
(MAP query)

A

“What is the most likely explanation for some evidence?”

I.e. what is the value of the node X that maximises the probability that we would have seen this evidence.

“What value of x maximises P(x | e)?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bayesian networks

A

(a.k.a. Belief networks)

Provide a graphical representation of a domain that provides an intuitive way to model conditional independence dependencies and handle uncertainties.

Providing a compact specification of full joint probability distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 Components of Bayesian networks

A
  • A finite set of variables.
  • A set of directed edges between the nodes, that form an acyclic graph.
  • A conditional probability distribution associated with each node P(Xᵢ | Parents( Xᵢ )). This is typically represented in a conditional probability table (CPT) indexed by values of the parent variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An uninstantiated node in a Bayesian Network

A

A node whose value is unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Conditional independence

A

A and B are conditionally independent given C

if and only if

P( A⋂B | C ) = P(A|C) . P(B|C)

Denoted

( A ⟂ B ) | C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

D-separation

A

Provides that:

If two variables, A & B, are d-separated given some evidence e, they are conditionally independent given the evidence.

P(A | B, e) = p(A | e)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Chain rule

A

The joint probability distribution is the product of all the individual probability distributions that are stored in the nodes of the graph.

P( X₁, …, Xₙ ) = ᴨ P( Xᵢ | Parents(Xᵢ) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parents of node Xᵢ

A

The nodes that have edges into Xᵢ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hybrid networks

A

Networks that contain both discrete and continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Clique

A

A clique of a graph is a subset of nodes of the graph such that every two distinct vertices in the clique are adjacent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Graph separation in an undirected graph

A

XA is conditional independant of XC given XB if there is no path from a ∈ A to c ∈ C after removing all variables in B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Markov blanket (undirected graph)

A

The Markov blanket of a variable is precisely its neighbours in the graph.

17
Q

Boltzmann Machine

A

Boltzmann machines learn the probability density from the input data to generating new samples from the same distribution.

It is a fully connected graph with pairwise potentials on binary-valued nodes.

They have a hidden layer of nodes and a visible layer of nodes.

18
Q

Restricted Boltzmann Machines

A

A class of Boltzmann machine with a single hidden layer and a bipartite connection.

This means that every node in the visible layer is connected to every node in the hidden layer but the nodes in the same layer are not connected to each other.

19
Q

Ising model

A

A mathematical model of ferromagnetism in statistical mechanics invented by Wilhelm Lenz (1920), who gave it as a problem to his student Ernst Ising.

20
Q

Hopfield network

A

A fully connected Ising model with a symmetric weight matrix.

The main application of Hopfield networks is as an associative memory or content addressable memory.

We train the network on a set of fully observed bit vectors we wish to learn. The weights of the network and bias terms can be learned from the training data using (approximate) maximum likelihood.

21
Q

Learning in fully observable Bayesian Networks

Known Structure

A

To learn the CPTs of a Bayesian network, we require a dataset D where D is made up of a set of cases or samples.

Each sample is a vector of values, one for each variable in the model.

The task is to find the model M that maximises Pr(D | M) (probability of the data given the model).

22
Q

Bayesian Correction

A

A correction factor made by adding 1 to the count in the numerator and a value m to the denominator where m is the number of possible diferent values the variable can take on.

23
Q

Inference by Stochastic Simulation

A
  • Exact inference in a Bayesian network requires marginalising latent variables from the joint distribution.
  • Marginalisation becomes quickly intractable in large networks.
  • An alternative is to use approximate inference methods
24
Q

Direct sampling

where we have an empty network with no evidence

A
  • Sample parents
  • Sample children from conditional probabilities P( X |Parents(X) )
  • Continue until no variable is left
25
Q

Main assumption made by direct sampling

A

The approach assumes no evidence is available in the network. It cannot be used to compute posterior probabilities.

(solution is to use Rejection sampling)

26
Q
A