Strength of a synaptic connection is proportional to the correlation of two connected neurons. If two neurons consistently fire simultaneously, synaptic connection is increased (if firing at different time, strength is reduced). “Cells that fire together, wire together.”

05 Neural Networks Flashcards by Henrik Hortemo

The Learning Problem

Improve over task T with respect to performance emasure P based on experience E.

How well did you know this?

Not at all

Perfectly

Supervised learning

How well did you know this?

Not at all

Perfectly

Unsupervised learning

Training data does not include desired outputs, instead the algorithm tries to identify similarities between the inputs that have something in common are categorised together.

How well did you know this?

Not at all

Perfectly

Reinforcement learning

The algorithm is told when the answer is wrong, but does not get told how to correct it. Algorithm must balance exploration of the unknown environment with exploitation of immediate rewards to maximize long- term rewards.

How well did you know this?

Not at all

Perfectly

Evolutionary learning

Biological organisms adapt to improve their survival rates and chance of having offspring in their environment, using the idea of

fitness (how good the current solution is

How well did you know this?

Not at all

Perfectly

The Machine Learning Process

Data Collection and Preparation
Feature Selection and Extraction
Algorithm Choice
Parameters and Model Selection
Training
Evaluation

How well did you know this?

Not at all

Perfectly

We are born with about _____ neurons. A neuron may connect to as many as _____ other neurons

We are born with about 100 billion neurons. A neuron may connect to as many as 10,000 other neurons

How well did you know this?

Not at all

Perfectly

Hebb’s Rule

Strength of a synaptic connection is proportional to the correlation of two connected neurons.
If two neurons consistently fire simultaneously, synaptic connection is increased (if firing at different time, strength is reduced).
“Cells that fire together, wire together.”

How well did you know this?

Not at all

Perfectly

How realistic is McCulloch and Pitts Neurons Model?

Not Very.

– Real neurons are much more complicated.

– Inputs to a real neuron are not necessary summed linearly.

– Real neuron do not output a single output response, but a SPIKE TRAIN.

– Weights wi can be positive or negative, whereas in biology connections are either excitatory OR inhibitory.

How well did you know this?

Not at all

Perfectly

Neural Networks: Updating the weights

Aim: minimize the error at the output

How well did you know this?

Not at all

Perfectly

The learning rate n

ɳ controls the size of the weight changes.

• Why not ɳ = 1?
– Weight change a lot, whenever the answer is wrong. – Makes the network unstable.

• Small ɳ
– Weights need to see the inputs more often before they change significantly.
– Network takes longer to learn.
– But, more stable network.

How well did you know this?

Not at all

Perfectly

Bias Input

• What happens when all the inputs to a neuron are zero?
– It doesn’t matter what the weights are,
– The only way that we can control whether neuron fires or not is through the threshold.

• That’s why threshold should be adjustable.
– Changing the threshold requires an extra parameter that we need to write code for.

• We add to each neuron an extra input with a fixed value.

How well did you know this?

Not at all

Perfectly

A single layer perceptron can only learn _____ problems.

A single layer perceptron can only learn linearly separable problems.

Boolean AND function is linearly separable, whereas Boolean XOR function (and the parity problem in general) is not.

How well did you know this?

Not at all

Perfectly

In contrast to perceptrons, multilayer networks can learn not only multiple _______, but the boundaries may be _____.

In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear.

How well did you know this?

Not at all

Perfectly

Linear Models can only identify flat decision boundaries like ___

Linear Models can only identify flat decision boundaries like straight lines, planes, hyperplanes, …

How well did you know this?

Not at all

Perfectly

MLP

Study These Flashcards

Multi-Layer Perceptron

The multilayer network structure, or architecture, or topology, consists of ____

Study These Flashcards

The multilayer network structure, or architecture, or topology, consists of an input layer, one or more hidden layers, and one output layer.

A network with ____ layers of _____ is a three- layer network

Study These Flashcards

A network with two layers of hidden units is a three- layer network

Properties of the Multi-Layer Network

Study These Flashcards

Layer n-1 is fully connected to layer n. • No connections within a single layer.
No direct connections between input and output layers.
Fully connected; all nodes in one layer connect to all nodes in the next layer.
Number of output units need not equal number of input units.
Number of hidden units per layer can be more or less than input or output units.

What Do Each of The Layers Do?

Study These Flashcards

How to learn Multi Layer Perceptrons?

Study These Flashcards

Backpropagation

Study These Flashcards

Calculate the output errors
Update last layer of weights
Propagate error backward, update hidden weights
Until first layer is reached

The backpropagation training algorithm uses the ____ technique to minimize the_____ between the desired and actual outputs.

Study These Flashcards

The backpropagation training algorithm uses the gradient descent technique to minimize the mean square difference between the desired and actual outputs.

MLP is trained initially selecting____ weights and then presenting all training data incrementally.

Study These Flashcards

MLP is trained initially selecting small random weights and then presenting all training data incrementally.

Gradient Descent in MLP (figure and equation)

Update rules for MLP

MLP: What do we want in an activation function?

* Differentiable * Nonlinear (more powerful) * Bounded range (for numerical stability)

Sigmoidal function

Sigmoidal (logistic) function g(a_i) = 1 / (1 + exp(-ka_i)

MLP: Learning capacity for different number of layers | (with sigmoid activation function)

MLP: Selecting initial weight values

* The MLP algorithm suggest that weights are initialized to small random numbers (\<± 1), both positive and negative * Choice of initial weight values is important as this decides starting position in weight space. That is, how far away from global minimum * Aim is to select weight values which produce midrange function signals (not in only saturated signal, see sigmoid function) * Select weight values randomly from uniform probability distribution * Normalise weight values so number of weighted connections per unit produces midrange function signal

MLP: When should the weights be updated?

After all inputs seen (batch) • More accurate estimate of gradient • Converges to local minimum faster (Jim doesn ́t agree!) • After each input is seen (sequential) • Simpler to program and most commonly used • May escape from local minima (change order or presentation) • Both ways, need many epochs - passes through the whole dataset

MLP: Input Normalization

* Stops the weights from getting unnecessarily large. * Treat each data dimension independently. * Each input variable should be processed so that the mean value is close to zero or at least very small when compared to the standard deviation.

MLP: Rule of thumb for amount of training data needed

10 times more data than the number of weights

Overfitting

Overfitting occurs when a model begins to learn the bias of the training data rather than learning to generalize. Overfitting generally occurs when a model is excessively complex in relation to the amount of data available. A model which overfits the training data will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. When we fit the model, it cannot tell which regularities are relevant and which are caused by sampling error. – So it fits both kinds of regularity. – If the model is very flexible it can model the sampling error really well. This is not what we want.

Solution to overfitting

(k-fold) Cross-validation

Validation data

k-fold cross validation

* Divide all data into k sets * For i = 1...k: * Train on data[i], validate on data[i + 1], test on rest * Average the results

05 Neural Networks Flashcards

(38 cards)