Week3. Shallow Neural Networks Flashcards

Question 1

Q

NN - logistic regreassion

Answer

A

Two logistic regressions one after another

Question 2

Q

Parts of NN

Answer

A

Input layer - X values of features - x = a^[0] or “activations”.

Hidden layer (it’s not in the training set - so hidden) a^[1] _1-4for 4 dimentional vector if it has 4 points.

Output layer - a^[2]

Each layer 1 and 2 has parameters w and b.

Question 3

Q

Computing one NN node

Answer

A

One logistic regression for the first node:

z₁^[1]= w₁^[1]Tx + b₁^[1]

a₁^[1]= sigmoid( z₁^[1])

Question 4

Q

Vectorised version for the whole NN

Answer

A

For two layers NN with 3 input and 4 hidden nodes

Z^[1] - (4,1) matrix (4 nodes in 1st layer)

W^[1]- (4,3) matrix - 4 nodes and 3 input activations

X - (3,1) vector - 3 activations (also a^[0])

b^[1] - (4,1) matrix (4 nodes in 1st layer)

a^[1] - (4,1) matrix (4 nodes in 1st layer) = zigmoid (Z^[1])

Z^[1] = W^[1]* a^[0] + b^[1]

a^[1] = zigmoid ( Z^[1])

To compute 2nd layer - same thing but differnt dimentions

Z^[2] - (1,1) matrix (1 nodes in 2nd layer)

W^[2]- (1,4) matrix - 4 nodes and 3 input activations

X - (4,1) vector - 4 activations from layer 1 (also a^[1])

b^[2] - (1,1) matrix (1 nodes in 2nd layer)

a^[2] - (1,1) matrix (1 nodes in 2nd layer) = zigmoid (Z^[2])

Z^[2] = W^[2]* a^[1] + b^[2]

a^[2] = zigmoid ( Z^[2])

Question 5

Q

Vectorizing across multiple exemples x_m

Answer

A

We need to compute the NN for each example 1 to m to get a^[2] for each exemple based on feature vector X_m

Question 6

Q

Derivatives of activation fucntions

Answer

A

Zigmoid a = g(z) = zig (Z) = 1 / (1 - e^-z)

Derivative g^|(z) = d/dz * g(z) = a (1-a)

Tanh g(z) = tanh(z) = (e^z- e^-z) / (e^z +e^-z)

Derivative g^|(z) = d/dz * g(z) = 1 - (tanh(z))²

ReLU g(z) = max(0,z)

Derivative g^|(z) = (0 if z < 0 ) OR ( 1 if > 0 )

Question 7

Q

Random initialization

Answer

A

Parameter W need to be initialized randomly.

W[1] = np.random.randn ( (2,2) ) * 0.001

Param b can be zero. b[1] = np.zeros((2,1))

W should be small otherwize learning would be too slow for sigmoid fucntion (binary classification).

Week3. Shallow Neural Networks Flashcards

(7 cards)