[2] Neural Networks Flashcards
What determines the output of a neuron?
It is the biased, weight sum of its inputs passed into an activation function
Why are activation functions important?
They allow the network to learn non-linearities
What is a perceptron?
A special type of ANN with:
- Real-valued inputs
- Binary output
- Threshold activation function
How are perceptrons trained?
Increase the weights (this is the threshold) based on whether the class is higher or lower than the perceptron
What idea limits the generalisability of perceptrons?
The Perceptron Convergence Theorem stats that perceptrons will converge if and only if the problem is linearly separable
Hence, they can’t learn XOR
What are the general approaches to updating weights?
Online learning updates weights after every instance; offline learning does it after every epoch.
Batch learning updates weights after every batch of instances
What algorithm is used to train neural networks?
Backpropagation:
[1] Calculate the predicted output using the current weights
[2] Calculate the error
[3] Update each weight in proportion to its gradient to the error i.e. how much changing that weight affects the error
Note: weights are trained backwards i.e. start at the last hidden layer
What are some potential issues when using backpropagation?
Improper learning rate leads to divergence or slow convergence
Overfitting if training too long, for using too many weights, or using too few instances
Local minima
How should variables be represented in an ANN?
Use a binary representation (i.e. one hot encoding) for nominal variables
For numeric variables, consider scaling or standardisation
What is scaling and standardization? When should each be used?
Scaling - scale then numbers between [0,1] if they are on a similar range
Standardisation - assume a normal distribution and scale it to N(0,1) if the values are more varied
What can happen if ANN weights aren’t set appropriately?
If they are all set to 0, the network will be symmetric i.e. all the weights will change together, and so it won’t train
If the weights are too high, the activation will be in the part of the sigmoid with a shallow gradient, and so training will be slow,
How should ANN weights be set?
Using fan-in factor, i.e. using a uniform random generator between -1/sqrt(d) and 1/sqrt(d) where d is the number of inputs
This ensures the variance of the weighted sum is approximately 1/3
How can back propagation be sped up?
With momentum, in which gradients from previous steps are used in addition to the current gradient
How can weight matricies be visualised?
With Hinton diagrams, in which the size of the square is based on the magnitude; it is white if it is positive and black if it is negative
What are the key principles of CNNs?
The automatically extract features to produce a feature map
They are not fully connected - convolutions with shared weights are used instead