lecture 2 Flashcards
Neural networks, connectionism, parallel distributed processing
Based on abstract view of the neuron
The connections determine the function of the network
Connections can be formed by learning and do not need to be programmed
Logic gates
Computers have electronic elements that implement ‘logic gates; and with these you can build and run programs
McCullock-Pitts neuron - assumptions
- The activity of the neuron is an ‘all or none’ process
a. Activation is either 0 or 1 - A certain fixed number of synapses must be excited within the period of latent addition in order to excite a neuron at any time
- The only significant delay within the NS is a synaptic delay
- The activity of any inhibitory synapse absolutely prevents excitation of the neuron at any time.
a. This is currently not the case anymore. Now its weighted - The structure of the net does not change with time
McCullock-Pitts neuron - 1 major thing the didn’t take into account
Real neurons are very noisy
So computation in the brain is fault tolerant, which doesn’t work in a computational model (that would give an error)
i.e., the brain does not work like a turing machine
so neural networks abstract strongly from the details of real neurons
neural networks abstract strongly from the details of real neurons
in what way do they differ
- Conductivity delays are neglected
- An output signal is discrete or a real-valued number
- Net input is calculated as the weighed sum of the input sigals
- Net input is transformed into an output signal via a simple function
Error-correcting learning
form of supervised learning
Perceptron
Original perceptron had only 2 layers (input and output layer)
Limitations of the perceptron
- Only binary values
> remedied by the delta-rule - Only 2 layers
Perceptron convergence theorem
If a pattern set can be represented by a two-layer perceptron, the perceptron learning rule will always be able to find some correct weights
o So if it can, it will.
o Does not say anything about how fast. Could be a slow process. But it will find it in the end.
needed for error backpropagation
- Algorithm to train perceptrons with more than 2 layers
- Preferably also one that used continuous and nonlinear activation rules
Characteristics of backpropagation
- Any number of layers
- only feedforward, no cycles
- uses continuous nodes
> activation between 0 and 1 - intial weights are random
- total error never increases
> gradient descent in error space
> so it goes down a little bit or stays the same
backprop trick
We have a node h in the hidden layer
We go to the error signal on the output layer that is calculated for each node
o Error = the difference between the target and the spontaneous output
We take all those errors in the output layer and add them up. This is the error we have for our hidden layer
o Can be positive and negative
Not biologically plausible because axons only work in 1 direction.
Backpropagation algorithm in rules
- weight change = small constant x error x input activation
- for an output node, the error is
o error = (targed activation – output activation) x output activation x (1-output activation)
o you add this to do gradient descent - for a hidden node, the error is
a. error = weighted sum of to-node errors x hidden activation x (1 – hidden activation) - weight change and momentum.
a. weight change = small constant x error x input activation + momentum constant x old weight change
Disadvantages backprop
- learning is slow
- new learning will rapidly overwrite old representations unless they are interleaved with the new patterns
- this makes it hard to keep networks up to date with new information
- this also makes in very implausible as a psychological model of human memory
advantages backprop
- easy to use
a. few parameters
b. algorithm is easy to implement - can be applied to a wide range of data
- very popular
- paved the way for deep learning