05 Neural Networks Flashcards

1
Q

The Learning Problem

A

Improve over task T with respect to performance emasure P based on experience E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised learning

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsupervised learning

A

Training data does not include desired outputs, instead the algorithm tries to identify similarities between the inputs that have something in common are categorised together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reinforcement learning

A

The algorithm is told when the answer is wrong, but does not get told how to correct it. Algorithm must balance exploration of the unknown environment with exploitation of immediate rewards to maximize long- term rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Evolutionary learning

A

Biological organisms adapt to improve their survival rates and chance of having offspring in their environment, using the idea of

fitness (how good the current solution is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Machine Learning Process

A
  1. Data Collection and Preparation
  2. Feature Selection and Extraction
  3. Algorithm Choice
  4. Parameters and Model Selection
  5. Training
  6. Evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We are born with about _____ neurons. A neuron may connect to as many as _____ other neurons

A

We are born with about 100 billion neurons. A neuron may connect to as many as 10,000 other neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hebb’s Rule

A
  • Strength of a synaptic connection is proportional to the correlation of two connected neurons.
  • If two neurons consistently fire simultaneously, synaptic connection is increased (if firing at different time, strength is reduced).
  • “Cells that fire together, wire together.”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How realistic is McCulloch and Pitts Neurons Model?

A

Not Very.

– Real neurons are much more complicated.

– Inputs to a real neuron are not necessary summed linearly.

– Real neuron do not output a single output response, but a SPIKE TRAIN.

– Weights wi can be positive or negative, whereas in biology connections are either excitatory OR inhibitory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Neural Networks: Updating the weights

A

Aim: minimize the error at the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The learning rate n

A

ɳ controls the size of the weight changes.

• Why not ɳ = 1?
– Weight change a lot, whenever the answer is wrong. – Makes the network unstable.

• Small ɳ
– Weights need to see the inputs more often before they change significantly.
– Network takes longer to learn.
– But, more stable network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bias Input

A

• What happens when all the inputs to a neuron are zero?
– It doesn’t matter what the weights are,
– The only way that we can control whether neuron fires or not is through the threshold.

• That’s why threshold should be adjustable.
– Changing the threshold requires an extra parameter that we need to write code for.

• We add to each neuron an extra input with a fixed value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A single layer perceptron can only learn _____ problems.

A

A single layer perceptron can only learn linearly separable problems.

Boolean AND function is linearly separable, whereas Boolean XOR function (and the parity problem in general) is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In contrast to perceptrons, multilayer networks can learn not only multiple _______, but the boundaries may be _____.

A

In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Linear Models can only identify flat decision boundaries like ___

A

Linear Models can only identify flat decision boundaries like straight lines, planes, hyperplanes, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MLP

A

Multi-Layer Perceptron

17
Q

The multilayer network structure, or architecture, or topology, consists of ____

A

The multilayer network structure, or architecture, or topology, consists of an input layer, one or more hidden layers, and one output layer.

18
Q

A network with ____ layers of _____ is a three- layer network

A

A network with two layers of hidden units is a three- layer network

19
Q

Properties of the Multi-Layer Network

A
  • Layer n-1 is fully connected to layer n. • No connections within a single layer.
  • No direct connections between input and output layers.
  • Fully connected; all nodes in one layer connect to all nodes in the next layer.
  • Number of output units need not equal number of input units.
  • Number of hidden units per layer can be more or less than input or output units.
20
Q

What Do Each of The Layers Do?

A
21
Q

How to learn Multi Layer Perceptrons?

A

Backpropagation

22
Q

Backpropagation

A
  1. Calculate the output errors
  2. Update last layer of weights
  3. Propagate error backward, update hidden weights
  4. Until first layer is reached
23
Q

The backpropagation training algorithm uses the ____ technique to minimize the_____ between the desired and actual outputs.

A

The backpropagation training algorithm uses the gradient descent technique to minimize the mean square difference between the desired and actual outputs.

24
Q

MLP is trained initially selecting____ weights and then presenting all training data incrementally.

A

MLP is trained initially selecting small random weights and then presenting all training data incrementally.

25
Q

Gradient Descent in MLP (figure and equation)

A
26
Q

Update rules for MLP

A
27
Q

MLP: What do we want in an activation function?

A
  • Differentiable
  • Nonlinear (more powerful)
  • Bounded range (for numerical stability)
28
Q

Sigmoidal function

A

Sigmoidal (logistic) function

g(ai) = 1 / (1 + exp(-kai)

29
Q

MLP: Learning capacity for different number of layers

(with sigmoid activation function)

A
30
Q

MLP: Selecting initial weight values

A
  • The MLP algorithm suggest that weights are initialized to small random numbers (<± 1), both positive and negative
  • Choice of initial weight values is important as this decides starting position in weight space. That is, how far away from global minimum
  • Aim is to select weight values which produce midrange function signals (not in only saturated signal, see sigmoid function)
  • Select weight values randomly from uniform probability distribution
  • Normalise weight values so number of weighted connections per unit produces midrange function signal
31
Q

MLP: When should the weights be updated?

A

After all inputs seen (batch)
• More accurate estimate of gradient
• Converges to local minimum faster (Jim doesn ́t agree!) •

After each input is seen (sequential)
• Simpler to program and most commonly used
• May escape from local minima (change order or presentation)
• Both ways, need many epochs - passes through the whole dataset

32
Q

MLP: Input Normalization

A
  • Stops the weights from getting unnecessarily large.
  • Treat each data dimension independently.
  • Each input variable should be processed so that the mean value is close to zero or at least very small when compared to the standard deviation.
33
Q

MLP: Rule of thumb for amount of training data needed

A

10 times more data than the number of weights

34
Q

Overfitting

A

Overfitting occurs when a model begins to learn the bias of the training data rather than learning to generalize.

Overfitting generally occurs when a model is excessively complex in relation to the amount of data available.

A model which overfits the training data will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

When we fit the model, it cannot tell which regularities are relevant and which are caused by sampling error.
– So it fits both kinds of regularity.
– If the model is very flexible it can model the sampling error really well. This is not what we want.

35
Q

Solution to overfitting

A

(k-fold) Cross-validation

36
Q

Validation data

A
37
Q
A
38
Q

k-fold cross validation

A
  • Divide all data into k sets
  • For i = 1…k:
    • Train on data[i], validate on data[i + 1], test on rest
  • Average the results