Week 9 Flashcards
What is the perceptron
Consists of a set of weighted connections, the neuron (incorporating the activation function) and output axon.
Activation function is the heaviside or threshold function.
How does a perceptron learn
Initialise weights & threshold
Present input and desired output
Calculate actual output of network
For each input, multiply input data xi by its weight wi
Sum the weighted inputs and pass through activation function
Adapt the weights:
if correct w(t+1) = w(t)
if output 0, should be 1, w(t+1) = w(t) + xi(t)
If output 1, should be 0, w(t+1) = w(t) - xi(t)
Modified version of learning
Weight update function can use a decimal term a between 0.0 and 1.0 to slow learning. This multiplies the input data so w(t+1) = w(t) + axi(t)
What is
Widrow-Hoff learning rule
Weight updates proportionate to the error made
Delta = desired output - actual output
w(t+1) = w(t) + a(delta)x(t)
Limitations of the perceptron
We can only solve linearly separable problems (draw a straight line which separates our two classes)
We cannot do this for XOR.
How do we address perceptron limitations
Adding a further layer to make MLP
Input layer
Hidden Layer
Output layer
Two stages of training
Feed forward
Backpropagation
Why use sigmoid for activation function
Smoother response
Steepness of curve is changed by z
Derivative can be easily computed
What are weights?
Variable strength connections between units
Propagate signals from one unit to the next
Main learning component
- weights are the main component changed during learning.
What is feedforward?
Initialise weights and thresholds to small random values
Present input and desired output
Calculate actual output
What is backpropagation?
Adapting the weights
Start from the output layer and work backwards
New weight = old weight, plus a learning rate * error for pattern p on node j * output signal for p on j
How do we compute error for different units?
For output units
Compute error sigmoid derivative * (target output - actual output)
For hidden units
Use the sigmoid derivative * weighted error of the k units in the layer above
Two types of weight updating
Batch updating (faster for training)
All patterns are presented, errors are calculated, then the weights are updated
Online updating
The weights are updated after the presentation of each pattern.
What is momentum?
Addition to the weight update function
Encourages the network to make large changes to weights if the weight changes are currently large
Allows network to avoid local minima in the early stages as it can overcome hills
weightupdatefunction + a(w(t) - w(t-1))
NN properties
Able to learn to relate input variables to required output e.g. input car attributes and predict fuel comsumption
Is able to generalise between samples
Shows graceful degradation