Week 9 Flashcards
What is the perceptron
Consists of a set of weighted connections, the neuron (incorporating the activation function) and output axon.
Activation function is the heaviside or threshold function.
How does a perceptron learn
Initialise weights & threshold
Present input and desired output
Calculate actual output of network
For each input, multiply input data xi by its weight wi
Sum the weighted inputs and pass through activation function
Adapt the weights:
if correct w(t+1) = w(t)
if output 0, should be 1, w(t+1) = w(t) + xi(t)
If output 1, should be 0, w(t+1) = w(t) - xi(t)
Modified version of learning
Weight update function can use a decimal term a between 0.0 and 1.0 to slow learning. This multiplies the input data so w(t+1) = w(t) + axi(t)
What is
Widrow-Hoff learning rule
Weight updates proportionate to the error made
Delta = desired output - actual output
w(t+1) = w(t) + a(delta)x(t)
Limitations of the perceptron
We can only solve linearly separable problems (draw a straight line which separates our two classes)
We cannot do this for XOR.
How do we address perceptron limitations
Adding a further layer to make MLP
Input layer
Hidden Layer
Output layer
Two stages of training
Feed forward
Backpropagation
Why use sigmoid for activation function
Smoother response
Steepness of curve is changed by z
Derivative can be easily computed
What are weights?
Variable strength connections between units
Propagate signals from one unit to the next
Main learning component
- weights are the main component changed during learning.
What is feedforward?
Initialise weights and thresholds to small random values
Present input and desired output
Calculate actual output
What is backpropagation?
Adapting the weights
Start from the output layer and work backwards
New weight = old weight, plus a learning rate * error for pattern p on node j * output signal for p on j
How do we compute error for different units?
For output units
Compute error sigmoid derivative * (target output - actual output)
For hidden units
Use the sigmoid derivative * weighted error of the k units in the layer above
Two types of weight updating
Batch updating (faster for training)
All patterns are presented, errors are calculated, then the weights are updated
Online updating
The weights are updated after the presentation of each pattern.
What is momentum?
Addition to the weight update function
Encourages the network to make large changes to weights if the weight changes are currently large
Allows network to avoid local minima in the early stages as it can overcome hills
weightupdatefunction + a(w(t) - w(t-1))
NN properties
Able to learn to relate input variables to required output e.g. input car attributes and predict fuel comsumption
Is able to generalise between samples
Shows graceful degradation
Classification vs regression
Classification: output of the function is to learn a class (discrete)
Regression: output is to learn a value (continuous)
Graceful degradation
In symbolic systems, the removal of one component of the system usually results in failure
Removal of neurons will not, but might reduce performance
Replicates our understanding of fault tolerance in the brain.
What is generalisation in symbolic AI
Symbolic systems are programmed rather than learnt, requiring explicit knowledge when this is not available
Can operate as expert systems in constrainted environments, will quickly fail outside of these
General purpose AI machines such as CYC incredibly difficult to build.
Generalisation in NN
NN can learn common patterns in data
Can learn the distinctions between difference classes of output
What makes A? What makes B?
Allows them to be noise tolerant
How would you program a computer to recognise characters symbolically?
Why is generalisation useful
Infer properties of new objects/events
Recognise objects despite orientation
How do we do classification with NNs
Training and test set needed
Test is to test generalisation
Training consists of measurements and a class and is used for learning.
Data representation issues
Continuous data
Good for NN
Requires normalisation for some activation functions
Integer type
Can be entered into a single input unit if ordinal, but is better using approach below
Discrete categories
Each value has separate representation in the network to avoid implied order bias (noise that confuses NN)
two representations, field type and thermometer type
Discrete representations types in NN
Field type
Thermometer type
What is field vs thermometer type representation
Represent each category as a single unit
Field
Hatchback 1,0,0
Saloon 0,1,0
Estate 0, 0, 1
Thermometer
Hatchback 1,0,0
Saloon 1,1,0
Estate 1,1,1