Week 8: The Perceptron - A Supervised Learning Algorithm: How does supervised learning is used to perform pattern recognition in perceptron? Flashcards
**Main Goal of this flashcards: **Learn to describe supervised learning is used to perform pattern recognition
in a perceptron
Rosenblatt’s Perception (2)
- Describes how a set of examples of stimuli and correct
responses can be used to train an artificial neural network
to respond correctly via changes in synaptic weights - Learning governed by the firing rates of the pre- and post-synaptic
neurons and the correct post-synaptic firing rate
(i.e.,a teaching signal) => “supervised learning”
Rosenblatt’s Perceptron is an
important historical example: instructive in
understanding the aim of using neural networks
for pattern classification.
What is a teaching signal?
Tell the network what the corrct output should be
The different types of learning rules (3)
- Unsupervised
- Supervised
- Reinforcement
What is unsupervised learning? (2)
There is no ‘teacher’ or feedback about right and wrong outputs
We just give pattern to the network and apply learning rule and see what happens
Examples of unsupervised learning (3)
- Hebbian learning
- Competitive learning rule
- BCM Rule
Supervsied learning is
providing a teaching signal
What is reinforcement learning?
Occasional reward or punishment (‘reinforcement learning’)
Reinforcement vs supervised learning (2)
In RL don’t have a teaching signal for every input-output combination as compared to SL
Only get occasional reward and punishment
Perceptron uses
standard artifical neurons with no dynamics
Simple Perception (4)
- One output neuron (O1) and two input neuron (x1 and x2)
- X1 and X2 neuron each have a input weight of w1 = 1 and w2 = 1
- To get output (activity of O1 neuron), you sum the input activity * input weight and put through trasnfer function (which is step function)
- Output neuron is active if both input neurons are active (given they reach a threshold of 1.5) = model performs logical ‘AND’ function
Diagram of Simple Perception
Diagram of Simple Perception Table = performs logical ‘AND’ function (3)
If both input neurons are not active then output neuron is not active
If one input neuron is active and other isn’t then output neuron is not
It is only when two input neurons are active that the output neuron is active
Rosenblatt’s produced graph of all possible x1 and x2 combinations from simple perceptron model
(4)
Dashed line is decision bounary
Left of line, 0 1 = 0
Right of line 01 = 1
We can write equation of line: x1 + x2 = 1.5, x2 = -x1 + 1.5 (rearranged first equation)
In Rosenblatt’s Perception graph, the line separating O=1 and 0=0 is where the net input equals the
threshold: w1x1 + w2x2 = T
In Roseblatt’s Perception we can rearrange equation for threshold (w1x1 +w2x2 = T) to make the equation of line: (3)
x2 = -(w1/w2)x1 + T/w2
(i.e., format of y = mx +c)
implcility assumed w1=w2=1
In Rosenblatt’s Perception, changing the weights will (2)
change the equation of line!
x2 = -(w1/w2)x1 + T/W2
Changing the weights is our ….. and …. learning in Rosenblatt’s Perception will (2)
mechanism to implement learning
update the position of the line/change decision boundary
In Rosenblat’’s Perception
learning, by changing the weights, to classify two groups of input patterns
means finding weights so that the line separates the groups
An input group is simply a set of
activity patterns across the input neurons
Pattern classification on Rosenblatt’s Perceptron : example that we want to classify adults and children based on height and weight (6)
- X1 signals weight
- X2 signals height
- Each individual (child/adult) we test will be one combination of two input neuron (weight/height) = pattern
- We train/learn the network on many example patterns of individuals, each individual will change input neuron’s weights and place a decision boundary to separate two groups (height and weigt)
- Decision boundary can be fuzzy since there may be kids tall or heavy and adults small and light but overall classifer separate children and adult based on height and weight
- After training this network on 100 individuals (50 adults/50 children), if we present 101’s person (that has not been used to train the network) , then we measure performance on how well it generalises (i.e., how well classifies the 101’s individual correctly)
Learning or training in perceptron model means
Present example patterns and each example pattern changes the input weight
What is the training set?
The set of patterns across your neurons (e.g., x1x2 representing weight and height)
Performance of pattern classification depends on Generalisation , whichis
give a new, never before seen datapoint (x1,x2) and observe output (result of classification)
Performance
how well the perceptron neural network classifies when presenting new datapoints
Perceptron slightly complex model now which involves (6)
- 5 input neurons
- 3 output neurons (3 possible results of pattern classification
- Start with inital random weights
- Output is n =3 so (o1k, o2k, o3k)
- Input here is n= 5 so (x1k, x2k, x3k, x4k, x5k)
- K = Means pattern K and output K which is the elements of training set
Diagram of Complex Perceptron
How does Complex Perceptron train/learn the neural network where you present example patterns which change input weights in model?
Using delta rule which is an example of a supervised learning rule
How do we train the complex Perceptron network, how does it learn using delta rule? (6)
In 3 output and 5 input neurons, which we have set of patterns across neurons (i.e., our training set)
Present a pattern k and find a corresponding output values (o1, o2k…)
So present pattern k and each output neuron perform computation as sum inputs activity * weight and put through transfer function and see if output is on or off based on inital random weights
We know if network has classified pattern k correctly by using delta rule in which has teaching signal for neural network (i.e., has acess correct target output for pattern k)
In delta rule, change in connection weights performed according to difference between the target and the output we get for the first input pattern k
Then represent next input pattern k –> k +1 and update weights again
The term (tik - oik) in delta rule also known as
delta which is the error made by outpit i for input k
The delta learning rule do in perceptron model is to
change the input weights as to reduce this error!
Delta rule numerical example in complex perceptron model (5)
Specifically, looking at connection weight from x1 and output neuron 02
Output activity of 02k , after computation, is o2k= 0.2,
Input activity of x1k was 0.7
We also have target t2k is: 0.3, so output of activity o2 want 0.3 instead of 0.2
The delta learning rule takes into account by changing connection weight between x1 and 02 by calculating:
w12 (connection weight of x1 and 02) + epilson 0.7 (0.3-0.2) so the error is reduced!
In delta learning rule, change in connection weight is big if
Disprenacy between target and output activity is big
In Rosenblatt’s perception was more complex as our complex perceptron model (10)
- artifical retina of inputs x
- Single output neuron that turn on or off depending whether pattern on artifical retina was correct one
- Single output neuron on if pattern detected (using transfer function)
- Presented input patterns xk
- Some patterns detected with target (tk=1) and some are not (they are foils)
- Each pattern , apply the delta learning rule
- After many presentations of whole traning set (in random order) it would find the best linear discriminator of target from foils
- Note the connection weight only change if there is an error and if input neuron is active
- If target is bigger than output then weight increases
- If target is lesser than output then weight decreases
- This delta rule changes the weights to reduce the error
Diagram of Rosenblatt’s Perceptron model:
The delta rule only learns weights, how do we find the value for the threshold T?
Just use T=0 and add another input x0=-1, then the weight from it w0 can serve the same purpose
Perceptrons can have many output units (forming a single-layer neural network) –
each output is trained using the delta rule independently of the others
Minsky and Papert, 1969 said that perceptrons can perform
linear discrimination and can not solve ‘non-linearly separable problems’
What is non-linearly separable problem? (Visual) - (5)
Graph of possible combination of input x1x2 combinations:
Say you want to learn x1 or x2
Output would be ON if X2 =1 and X1 = 0, OR X2 = 0 and X1 = 1
Can we place a line that divides this space such as output is 1 and output is 0 on other side?
Can’t do this separation linearly
Rosenblatt’s Perceptron networks
performs linear discrimination in which the decision boudnary is always linear
But if you can include hidden layers in perceptron than give more power
to solve these ‘non-linearly separable problems’