Week 9 - Neural Networks Flashcards
What are the two types of learning a neural network can do?
Supervised and unsupervised
What is supervised learning?
The output of the neural network is compared against the correct output. The network then corrects itself based on that output. Overall, the data is labelled
What is unsupervised learning?
The data is not labelled. The network organises itself according to patterns in the data and no external “desired output” is provided
What does a perceptron consist of?
It consists of a set of weighted connections, the neuron (incorporating the activation function) and the output axon
How does a perceptron learn?
Initialise weights & threshold
Present the input and desired output
Calculate the actual output of the network:
For each input:
* Multiply the input data (xi) by its weight (wi).
* Sum the weighted inputs and pass through the activation function
Adapt the weights:
* If correct wi(t+1) = wi(t)
* If output 0, should be 1 wi(t+1) = wi(t)+xi(t)
* If output 1, should be 0 wi(t+1) = wi(t)-xi(t)
What can the weight update function be used to give?
A decimal term η between 0.0 and 1.0 to slow learning
What does the Widrow-Hoff Learning Rule give us?
Weight updates proportionate to the error made, giving Δ = desired output – actual output
Name a limitation of the perceptron.
Only linearly separable problems can be solved
What does linearly seperable mean?
A straight line which seprates two classes can be drawn
What are the three layers of an MLP (Multi-Layer Perceptron)?
Input, hidden, output
What are weights in terms of neural networks?
Variable strength connections between units that propagate signals from one unit to the next. They are the main component changed during learning
Describe the feedforward learning algorithm.
Initialise weights and thresholds to small random values.
Present input and desired output.
Calculate actual output by:
- Multiplying incoming signal by weight
- Pass this through sigmoid activation function
- Pass on this output to units in the next layer
Describe the backpropagation learning algorithm.
Adapt the weights.
Start from the output layer and work backwards:
- New weight (t+1) = old weight, plus a learning rateerror for pattern p on node joutput signal for p on j
Compute error as follows:
- For output units, compute error sigmoid derivativetarget output - actual output
- For hidden units, use the sigmoid derivativeweighted error of the k units in the layer above
What are the two types of weight updating?
Batch and online
Describe batch weight updating.
All patterns are presented, errors are calculated, then the weights are updated
Describe online weight updating.
The weights are updated after the presentation of each pattern
What does momentum do in terms of weight updating?
Encourages the network to make large changes to weights if the weight changes are currently large. This allows the network to avoid local minima in early stages as it can overcome hills
List some differences between symbolic AI and connectionism?
Symbolic:
explicit representation (symbols)
expert system (IF-THEN) rules
enabling reasoning
knowledge programmed (by humans)
serial (fragile)
does not generalise (outside scope)
understandable/explainable
Connectionism:
implicit representation (numbers)
neural network (weighted graph)
enabling perception
knowledge learned (from data)
distributed (graceful degradation)
generalise (outside scope)
black-box
Name some properties of neural networks.
Able to learn to relate input variables to required output
Able to generalise between samples
Shows graceful degradation
Define classification.
The output of the function to learn is a class (discrete)
Define regression.
The output of the function to learn is a value (continuous)
What is graceful degradation?
In symbolic systems, the removal of one component of the system usually results in failure
Removal of neuron(s) from a neural network will reduce performance and probably not result in overall failure, which replicates out understanding of fault tolerance in the brain (graceful degradation)
What is classification designed to do?
Group samples according to some known properly
What are the two datasets required for a classification problem?
Training (consists of a set of measurements (inputs) and a class (output), used for learning) and testing (“unseen” examples to test generalisation)
What is an issue in representing continuous data in a neural network?
Can require normalisation depending on the activation function
What is an issue in representing discrete data in a neural network?
Each value needs to have a separate representation in the network to avoid implied order bias (noise that can confuse a neural network)
What are the two representations of discrete data in a neural network?
Field-type and thermometer-type
When does overfitting occur and how can it be overcome?
When the network is trained on a task for too long, the network learns the noise in the input as well as common patterns, so the result is poor on unseen examples
It can be overcome by stopping the network learning earlier - early stopping
What is a common method for early stopping?
Cross-validation - have three data sets: training, testing and cross-validation