Week3. Shallow Neural Networks Flashcards
NN - logistic regreassion
Two logistic regressions one after another
Parts of NN
Input layer - X values of features - x = a[0] or “activations”.
Hidden layer (it’s not in the training set - so hidden) a[1] 1-4 for 4 dimentional vector if it has 4 points.
Output layer - a[2]
Each layer 1 and 2 has parameters w and b.
Computing one NN node
One logistic regression for the first node:
z1[1]= w1[1]T x + b1[1]
a1[1] = sigmoid( z1[1] )
Vectorised version for the whole NN
For two layers NN with 3 input and 4 hidden nodes
Z[1] - (4,1) matrix (4 nodes in 1st layer)
W[1] - (4,3) matrix - 4 nodes and 3 input activations
X - (3,1) vector - 3 activations (also a[0])
b[1] - (4,1) matrix (4 nodes in 1st layer)
a[1] - (4,1) matrix (4 nodes in 1st layer) = zigmoid (Z[1])
Z[1] = W[1] * a[0] + b[1]
a[1] = zigmoid ( Z[1])
To compute 2nd layer - same thing but differnt dimentions
Z[2] - (1,1) matrix (1 nodes in 2nd layer)
W[2] - (1,4) matrix - 4 nodes and 3 input activations
X - (4,1) vector - 4 activations from layer 1 (also a[1])
b[2] - (1,1) matrix (1 nodes in 2nd layer)
a[2] - (1,1) matrix (1 nodes in 2nd layer) = zigmoid (Z[2])
Z[2] = W[2] * a[1] + b[2]
a[2] = zigmoid ( Z[2])
Vectorizing across multiple exemples xm
We need to compute the NN for each example 1 to m to get a[2] for each exemple based on feature vector Xm
Derivatives of activation fucntions
Zigmoid a = g(z) = zig (Z) = 1 / (1 - e-z)
Derivative g|(z) = d/dz * g(z) = a (1-a)
Tanh g(z) = tanh(z) = (ez - e-z ) / (ez + e-z )
Derivative g|(z) = d/dz * g(z) = 1 - (tanh(z))2
ReLU g(z) = max(0,z)
Derivative g|(z) = (0 if z < 0 ) OR ( 1 if > 0 )
Random initialization
Parameter W need to be initialized randomly.
W[1] = np.random.randn ( (2,2) ) * 0.001
Param b can be zero. b[1] = np.zeros((2,1))
W should be small otherwize learning would be too slow for sigmoid fucntion (binary classification).