Networks Flashcards

Question 1

Q

What are two alternatives to the Relu function?

Answer

A

ELU(Exponential linear units) and leaky Relu.

Question 2

Q

what is the difference between SGD with momentum and SGD with Nestrov momentum?

Answer

A

Nestrov momentum adds the velocity to parameter before computing gradients

Question 3

Q

What is the idea behind Adagrad?

Answer

A

Adagrad reduces the learning rate of gradient dimensions with large square value.

Question 4

Q

What is the idea behind RMS prop?

Answer

A

Adagrad reduces the learning rate of gradient dimensions with large running mean square value.

Question 5

Q

What is the idea behind Adam optimizer?

Answer

A

Combines SGD with momentum and RMS prop.

Question 6

Q

Name some regularizers

Answer

A

L1, L2, Early stopping, Dropout, constrain max norm, data augmentation

Question 7

Q

Name two commonly used initialization methods for weights in a Neural network

Answer

A

w = N(0, 1/sqrt(N)) or Xavier: U(-1/sqrt(N), 1/sqrt(N))

Question 8

Q

What is the formula for batch normalization?

Answer

A

z_hat = (z - mu)/sigma
z_new = y*z_hat + b, where y and b are learnt parameters.

Question 9

Q

What is the ouput size after a convolutional layer

Answer

A

out = (in + 2*pad - filter_size)/stride + 1

Question 10

Q

Why are deep networks harder to train and how can we solve this?

Answer

A

Question 11

Q

What is the idea behind the inception net?

Answer

A

Each layer tries several filter sizes and the networks “learns” the best filter size

(11 cards)