ANN Flashcards
What is the difference between supervised ANNs and unsupervised ones?
Supervised = trained with labelled inputs - weights attempt to fit correct output - inductive
Unsupervised = learning done iteratively to satisfy some learning rule
What are the types of ANN topologies?
Feed forward: all weights are directed forwards
Recurrent: Weights can point backwards and provide immediate feedback.
[Hidden] Single / Multiple layered
Partially / Fully connected: describes connectivity between nodes across layers
What are the different activation functions? Explain them.
Linear: output = c*input
Threshold: If weighted sum > 0, output = 1; else -1;
Sigmoid: Continuous threshold. Bound output between 1 and -1;
When would you use ANNs?
When:
- Input data is high dimension + continuous
- Data is noisy
- Long training times are OK
- When you have enough labelled training data
- When target function is unknown
- When explaining the result is not important
What are the 3 learning rules? Describe them.
- Hebb’s rule: If 2 connected neurons are simultaneously on => weight( new ) = weight (old) + x1x2
2. Perceptron rule: weight (new) = weight(old) + σ(t-o)x t = target output (0,1) o = actual output (0,1) x = neuron input σ = learning rate *** threshold activation
- Delta rule: weight (new) = weight(old) + σ(t-o)x
* ** linear/continuous activation
* ** outputs can be anything
What are perceptrons?
Linear classifiers - a hyper plane ( vector space of one dimension lower that divides a space classifying data points )
If weighted sum + bias > the boundary, perceptron outputs a 1
Else, 0.
*** outputs can only be 1 or 0/-1
What is the point of the bias term?
Speeds up learning by shifting the hyper plane from the origin - do not always need them
What are some of the properties of perceptrons?
Can classify inputs as 0 or 1 => can simulate any logic gate
How does perceptron learning work?
Using the perceptron learning rule:
For N training examples:
if o /= t:
Update weights according to the learning rule s.t error is minimised
Stop when error is acceptable or i = N
Note: perceptron rules adapts weights only
What is the fundamental basis of perceptron learning?
Error correction learning => adjust weights until o = t
What is the limitation of single perceptrons? What is the solution?
Can only classify linear separable spaces
Multiple Layer perceptrons = interconnected perceptrons
When use the delta learning rule?
If the range of values we want to be able to produce is continuous - continuous activation functions
*** t and o do not have to be 1 or 0
How does delta learning work?
The exact same way as perceptron learning
DLR vs PLR - differences and similarities
Same: both learn through error correction
Different:
PLR can only work with threshold activation functions (0;1)
DLR can work with any differentiable activation function
What is a universal function approximator?
ANN with at least one hidden layer (1) enough nodes (2) continuous activation function (3) can approximate any continuous non-linear function
What is the purpose of the delta learning rule?
Classify non linearly separable problems
How do you decide the type of perceptron organisation to use as well as the activation function?
Is the problem linearly separable? No: use MLP Yes: Is the output Binary? No: use single perceptron + continuous activation function Yes: use single perceptron + threshold activation function.
What is the weight/hypothesis space?
Mapping of weights to error - gradient descent aims to find minimums in this topology
How do we calculate error for learning by error minimisation for the ∆ rule?
E(w) = 1/2Σ(t-o)^2 ==> modified MSE
What is gradient descent?
Iteratively change weights to reduce MSE
Can implement DLR at a neuron or network level.
Always moves us in the direction of the steepest decrease in slope
What is backpropagation?
Process of feeding error back through the network to determine by how much we need to adjust weights.
What do we need for back-propagation to work?
- differentiable activation function
2. non linear activation function
What is the back propagation algorithm?
- Forward pass
- Calculate ẟi for output neurons
[ẟi = Oi*(1-Oi) * (Ti-Oi)] - sigmoid - Calculate change in weights to output nodes
[∆Whi = η * ẟi * xhi] - where xhi is the output from the previous node - Calculate ẟh for hidden neurons
[ẟh = Oh * (1-Oh) * ΣWhi*ẟi] - Calculate change in weights to hidden nodes
[∆Whi = η * ẟi * xhi] - where xhi is the output from - Update weights