04 neural networks* Flashcards
artificial neurons
input signals = independent variables
triggers when activated = activation functions
nerve impulse = output value
perceptron
input layer = linear regression
perceptron = calculate weighted sum on input, bias factor
produces output value
activation functions
threshold function
if x >= 0, output 1
if x < 0, output 0
problems with perception
non linear curves, eg. XOR problem`
feed forward neural network
one or more nodes in hidden layer
- node in one layer is linked to every single node on the second layer
- more than one activation functions can be used
- simulate thinking process of multiple input signals with different combinations
neuron
non input node (hidden node) and output node
can be used with:
- sigmoid
- rectifier linear unit RELU
- hyperbolic tangent tanh
to account for non-linear
sigmoid
1/ (1 + pow(e, -x))
0 to 1
hyperbolic tangent
(pow(e, x) - pow(e, -x)) / (pow(e, x) + pow(e, -x))
-1 to 1
rectifier linear unit
max(0, x)
0 to infinite
training artificial neural network
- initialise weights
- set epochs (how many rounds)
- forward propagation
- predict results y
- compute errors (mean squared or cross entropy)
- back propagation (update weights) - repeat
back propagation
- update weights to minimise loss/cost/error
- done with gradient descent
- plot sum of square errors (y) and intercept (x) (form a happy curve)
- by measuring the gradient, we can find out the point where error is the lowest, therefore selecting that parameter (gradient = 0 at the lowest point of the curve)
- each step to measure the next gradient is done using “learn rate”
- if learning rate too small, slow traing
- if learning rate too high, miss optimal point
- max step configured to usually be 1000
overfitting
common in deep learning when high numbers of hidden nodes and layers
solutions:
- complexity
- dropout
- regularisation
tuning dataset
split data into training and testing set
machine learning vs deep learning
ML
input -> feature engineering -> classification -> output
Deep learning
input -> feature extraction + classification -> output
padding
add 0s to the edge so that output and input is the same size