Hopfield Nets and BMs Flashcards
What is a Hopfield Net?
- Networks of binary threshold units
- Feedback networks - each units has connections to all other units except itself
- Connections are symmetric (w[ij] is the weight on the connection between neuron i and neuron j
Are Hopfield Networks a type of FeedForward network?
No.
Since there is no obvious way of sorting the neurons from inputs to outputs (since every neuron is input to all other neurons)
How do we perform unit value updates?
A. Synchronous - all neurons change state simultaneously based on the current state of all other neurons.
B. Asynchronous - one after the other other
Energy function of a Hopfield net
- Because all the connections are symmetric, it is possible to build a global energy function
According to this global energy function, each config (set of neuron states) of the network can be stored
- It is possible to look for configurations of (possibly locally) minimal energy (in fact the whole space of weights is divided into basins of attraction, each one containing a minimum of the energy)
What is the global energy?
Sum of many contributions
Depends on one connection weight and the binary state of 2 neurons
Energy gap (computing how the state of one neuron affects the global energy)
E(y[i] = -1) - E(y[i] = 1)
What is learning in Hopfield nets?
Storing memories where memories are the energy minima
How can we help clean up incomplete or corrupted memories?
Binary threshold decision
Σ(for all p) w[ij]*y[i] + b[i]
What’s the update rule for Hopfield nets when for (-1,1) states
delta w[ij] = learning rate * Σ(for all p) s[i] * s[j]
What’s the update rule for Hopfield nets when for (0,1) states
delta w[ij] = 4* learning rate * Σ(for all p) (s[i] - 0.5) * (s[j] - 0.5)
How many memories can be stored in a Hopfield net?
P/N = 0.14
P = num memories N = num neurons
Above this, the probability of failure increases drastically
Critical State - Rules of Thumb
P/N > 0.14 - No minima is related to the learned pattern
- 14 > P/N > 0.05 - Both learned and other minima. Other minima tend to dominate
- 05 > P/N > 0 - Both learned and other minima; learned minima dominate (lower energy)
What’s the problem with storing more memories than P/N = 0.14?
- Limited by spurious memories
- 2 nearby energy minima combine to make a new minimum in the wrong place
Capacity: of a totally connected network with N binary threshold units is 0.15N memories before memories start getting confused with one another
At N bits per memory, its a total of 0.15N^2 bits
However it takes well over 0.15N^2 bits to store the weights which requires N^2log(base2)(2M+1) - which is on a logarithmic scale and we should be able to do better
How can we increase the capacity of Hopfield nets?
- Elizabeth Gardiner
- Instead of trying to store vectors in one shot, cycle through the training set many times
- Use the perceptron convergence procedure to train each unit to have the correct state given the state of all the other units in the vector
What is a Boltzmann Machine?
A network of symmetrically connected units that make stochastic decisions about whether to be on/off
- Hinton & Sejnowski, 1983
- Just replace binary threshold units with binary stochastic units in a Hopfield net so that it makes random biased decisions
- A neuron is switched on/off with a certain probability instead of deterministically
- Temperature controls the amount of noise
- Raising noise == decreasing the energy gaps between configurations
Probability that a unit turns on in Boltzmann machines
P(s[i] = 1) = 1 / (1 + e^(-delta(E[i]) / T))
When T = 0, if behaves like a binary threshold unit, whole calculation goes to 0 or 1 depending on sign (it’s deterministic)
As we lower T, unit will become more firmly on or off
What is the difference between BMs and Hopfield Nets?
- Since in BMs, the unit update rule or activation function has a probabilistic component, if we allow the network to run it will settle into a given equilibrium state only with a certain probability
- Hopfield nets => deterministic (Given an initial state, their equilibrium state is determined)
Weaknesses of Hopfield nets and BMs
- All units are visible (correspond to observable stuff)
2. Nothing more than second-order interactions can be captured by Hopfield nets and the BM
What is an RBM?
BMs with a single layer of feature detectors (since learning is slow in BMs with many hidden layers)
Instead many hidden layers can be learned efficiently by composing RBMs using the feature activations of one as the training data for the next RBM
What is an RBM?
- Smolensky, 1986
- Layer of visible units and layer of hidden units, but no visible-visible or hidden-hidden connections
Learning procedure of RBMs
- Contrastive divergence (approximates gradient descent) (Hinton, 2002)
- Starting with a data vector on the visible units, update all of the hidden units in parallel
- Update all of the visible units in parallel to get a ‘reconstruction’
- Update hidden units again
Uses of RBMs
- Auto-encoders
- Single multi-layer generative model where each additional layer improves a lower bound on the probability that the model would generate the training data