lecture 2 Flashcards

1
Q

Neural networks, connectionism, parallel distributed processing

A

Based on abstract view of the neuron

The connections determine the function of the network

Connections can be formed by learning and do not need to be programmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logic gates

A

Computers have electronic elements that implement ‘logic gates; and with these you can build and run programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

McCullock-Pitts neuron - assumptions

A
  1. The activity of the neuron is an ‘all or none’ process
    a. Activation is either 0 or 1
  2. A certain fixed number of synapses must be excited within the period of latent addition in order to excite a neuron at any time
  3. The only significant delay within the NS is a synaptic delay
  4. The activity of any inhibitory synapse absolutely prevents excitation of the neuron at any time.
    a. This is currently not the case anymore. Now its weighted
  5. The structure of the net does not change with time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

McCullock-Pitts neuron - 1 major thing the didn’t take into account

A

Real neurons are very noisy

 So computation in the brain is fault tolerant, which doesn’t work in a computational model (that would give an error)

 i.e., the brain does not work like a turing machine

 so neural networks abstract strongly from the details of real neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

neural networks abstract strongly from the details of real neurons

in what way do they differ

A
  1. Conductivity delays are neglected
  2. An output signal is discrete or a real-valued number
  3. Net input is calculated as the weighed sum of the input sigals
  4. Net input is transformed into an output signal via a simple function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Error-correcting learning

A

form of supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Perceptron

A

Original perceptron had only 2 layers (input and output layer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Limitations of the perceptron

A
  1. Only binary values
    > remedied by the delta-rule
  2. Only 2 layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Perceptron convergence theorem

A

If a pattern set can be represented by a two-layer perceptron, the perceptron learning rule will always be able to find some correct weights

o So if it can, it will.
o Does not say anything about how fast. Could be a slow process. But it will find it in the end.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

needed for error backpropagation

A
  1. Algorithm to train perceptrons with more than 2 layers
  2. Preferably also one that used continuous and nonlinear activation rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Characteristics of backpropagation

A
  1. Any number of layers
  2. only feedforward, no cycles
  3. uses continuous nodes
    > activation between 0 and 1
  4. intial weights are random
  5. total error never increases
    > gradient descent in error space
    > so it goes down a little bit or stays the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

backprop trick

A

 We have a node h in the hidden layer

 We go to the error signal on the output layer that is calculated for each node
o Error = the difference between the target and the spontaneous output

 We take all those errors in the output layer and add them up. This is the error we have for our hidden layer
o Can be positive and negative

 Not biologically plausible because axons only work in 1 direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Backpropagation algorithm in rules

A
  1. weight change = small constant x error x input activation
  2. for an output node, the error is
    o error = (targed activation – output activation) x output activation x (1-output activation)
    o you add this to do gradient descent
  3. for a hidden node, the error is
    a. error = weighted sum of to-node errors x hidden activation x (1 – hidden activation)
  4. weight change and momentum.
    a. weight change = small constant x error x input activation + momentum constant x old weight change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages backprop

A
  1. learning is slow
  2. new learning will rapidly overwrite old representations unless they are interleaved with the new patterns
  3. this makes it hard to keep networks up to date with new information
  4. this also makes in very implausible as a psychological model of human memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

advantages backprop

A
  1. easy to use
    a. few parameters
    b. algorithm is easy to implement
  2. can be applied to a wide range of data
  3. very popular
  4. paved the way for deep learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

gradient descent in error space (= find the steepest slope down), but

A

i. does not guarantee high performance

ii. does not prevent getting stuck in local minima (unlike perceptron that will find the solution if there is one)

iii. the learning rule is complicated and tends to slow down