ANN Flashcards
What is the difference between supervised ANNs and unsupervised ones?
Supervised = trained with labelled inputs - weights attempt to fit correct output - inductive
Unsupervised = learning done iteratively to satisfy some learning rule
What are the types of ANN topologies?
Feed forward: all weights are directed forwards
Recurrent: Weights can point backwards and provide immediate feedback.
[Hidden] Single / Multiple layered
Partially / Fully connected: describes connectivity between nodes across layers
What are the different activation functions? Explain them.
Linear: output = c*input
Threshold: If weighted sum > 0, output = 1; else -1;
Sigmoid: Continuous threshold. Bound output between 1 and -1;
When would you use ANNs?
When:
- Input data is high dimension + continuous
- Data is noisy
- Long training times are OK
- When you have enough labelled training data
- When target function is unknown
- When explaining the result is not important
What are the 3 learning rules? Describe them.
- Hebb’s rule: If 2 connected neurons are simultaneously on => weight( new ) = weight (old) + x1x2
2. Perceptron rule: weight (new) = weight(old) + σ(t-o)x t = target output (0,1) o = actual output (0,1) x = neuron input σ = learning rate *** threshold activation
- Delta rule: weight (new) = weight(old) + σ(t-o)x
* ** linear/continuous activation
* ** outputs can be anything
What are perceptrons?
Linear classifiers - a hyper plane ( vector space of one dimension lower that divides a space classifying data points )
If weighted sum + bias > the boundary, perceptron outputs a 1
Else, 0.
*** outputs can only be 1 or 0/-1
What is the point of the bias term?
Speeds up learning by shifting the hyper plane from the origin - do not always need them
What are some of the properties of perceptrons?
Can classify inputs as 0 or 1 => can simulate any logic gate
How does perceptron learning work?
Using the perceptron learning rule:
For N training examples:
if o /= t:
Update weights according to the learning rule s.t error is minimised
Stop when error is acceptable or i = N
Note: perceptron rules adapts weights only
What is the fundamental basis of perceptron learning?
Error correction learning => adjust weights until o = t
What is the limitation of single perceptrons? What is the solution?
Can only classify linear separable spaces
Multiple Layer perceptrons = interconnected perceptrons
When use the delta learning rule?
If the range of values we want to be able to produce is continuous - continuous activation functions
*** t and o do not have to be 1 or 0
How does delta learning work?
The exact same way as perceptron learning
DLR vs PLR - differences and similarities
Same: both learn through error correction
Different:
PLR can only work with threshold activation functions (0;1)
DLR can work with any differentiable activation function
What is a universal function approximator?
ANN with at least one hidden layer (1) enough nodes (2) continuous activation function (3) can approximate any continuous non-linear function