Week 4: Multilayer Perceptrons and Backpropagation Flashcards
Feedforward Neural Networks
There’s 1 input layer, hidden layer(s), and 1 output layer. Each layer is an array of neurons, with layers interconnected by links. Each link has a connection weight. The connections are one-way, going forward.
Activation/Transfer Function
Activation functions take on various forms, with the main purpose of transforming the input into a desirable range of outputs.
Symmetric Hard Limit (Signum) Transfer Function
f(n) = sgn(n)
Linear Transfer Function
f(n) = n
Symmetric Sigmoid Transfer Function
f(n) = 2/(1+e^{-2n}) - 1
Logarithmic Sigmoid Transfer Function
f(n) = 1/(1+e^{-n})
Radial Basis Transfer Function
f(n) = e^{-n^2}
Backpropagation
This method allows weights to be updated based on the effective error for each hidden unit using gradient descent.
Learning Curves
These curves measure the training, test, and validation error over the training process.
Radial Basis Function Neural Networks
The input unit uses the linear transfer function. The hidden unit use the Radial Basis Function. The output unit can use any activation function.
d_j = ||x - c_j||, with x being inputs, c_j being the centres of the basis function. c_j can be preset or determined by a training algorithm. The result of the hidden node is h(d_j).
Similarities with MLP Networks:
- Both are universal approximators
- Both are nonlinear feed-forward neural networks
Differences with MLP Networks:
- RBF networks has only 1 hidden layer, but MLP networks can have more than 1 hidden layer.
- RBF networks use different basis functions from activation functions used in MLP networks.
- RBF networks compute distance between input patterns and centres while MLP networks compute inner product of input patterns and weights.
- RBF are trained with a 2-phase algorithms but MLP networks are trained with a single-phase algorithm
Basis Functions
These functions are used in the hidden nodes. As d_j approaches infinity, h(d_j, \sigma) approaches 0.
Gaussian Function
h(d_j) = e^{\frac{-d_j^2}{ 2 \sigma ^2}}, \sigma > 0
Multi-quadric Function
h(d_j) = (d_j^2 + \sigma^2)^{0.5}, \sigma > 0
Generalised Multi-quadric Function
h(d_j) = (d_j^2 + \sigma^2)^{\beta}, \sigma>0, 0 < \beta < 1
Inverse Multi-quadric Function
h(d_j) = (d_j^2 + \sigma^2)^{-0.5}