Neural Networks Flashcards

Question 1

Q

What activation function did the first neural network use?

Answer

A

Heaviside step function

Question 2

Q

What method is used for optimizing multi layer neural networks

Answer

A

Backpropagation

Question 3

Q

Name some improvements to neural networks the past 30 years

Answer

A

1 ) Better hardware

2) Deeper networks
3) Larger datasetets
4) Other changes, better activation funcitons, different layers…

Question 4

Q

How can we adapt gradient descent to work with very large training sets?

Answer

A

Stochastic gradient descent. (Use a random batch from the training data and update the weights using this batch).

Question 5

Q

How does training step influence the error for:

1) Very high learning rate
2) high learning rate
3) low learning rate

Answer

A

1) The error will increase rapidly
2) The error will decrease rapidly in the begining and then “flatten” out, never reaching the optimum
3) The error will decrease slowly.

Question 6

Q

What is the vanishing gradient problem?

Answer

A

Activation functions like the sigmoid saturate for large/small values of x, meaning the gradient is close to 0.

Question 7

Q

What is the exploding gradient problem?

Answer

A

The gradient suddently increases a lot. The gradient descent algorithm can “jump” far away from the optimal solution.

Question 8

Q

How can we adapt gradient descent to fix the vanishing and exploding gradient problems?

Answer

A

1) We can use adaptiv stepsizes.

2) We can Clip the gradient using thresholding or L2 norm.

Question 9

Q

What is the main advantage of the relu over the sigmoid activation function?

Answer

A

It doesn’t saturate for high values of x.

Question 10

Q

How can we deal with relu saturation for input values below 0?

Answer

A

We can use leaky relu, PRelu, or Elu instead.

Question 11

Q

What types of problems is the mean squared loss function most commonly used for?

Answer

A

Regression.

Question 12

Q

What ouput function do we usually use for the binary classification problem?

Answer

A

Sigmoid (Or softmax with 2 outputs, one for “true” one for “false”…)

Question 13

Q

What output function do we usually use for Multi-class classification problems?

Question 14

Q

What loss function do we usually use for classification?

Answer

A

Cross entropy

Question 15

Q

What is data augmentation?

Answer

A

We increase the training set by adding distorted, squeezed, tilted… versions of the original dataset.

Question 16

Q

What assumption do we make when using a CNN?

Answer

A

nearby features (for example pixels) are dependent on eachother.

Question 17

Q

How we make sure that the output of a convolutional layer is the same size as the input.

Answer

A

Zero-pad the input.

Question 18

Q

What happens to the output size if have stride=2 in a convolutional layer? (When the input is zero padded..)

Answer

A

The output size is 1/4 of the input size.

Question 19

Q

How does the kernel in a convo layer look if we use dilation = 2 and size 3?

Answer

A

[X, –, X
–, X, –
X, –, X]

We only “look” at values where the kernel is X.

Question 20

Q

What is pooling in a convolutional network?

Answer

A

It is like downsampling. We usually use max-pool meaning that the maximum value in the local region is selected.

Question 21

Q

What is a residual net?

Answer

A

A residual net has “skip connections” mainly allowing the gradient “skip” layers on backprop to leviate vanishing gradients.

Question 22

Q

Name some famous neural networks

Answer

A

GoogleNet, ResNet, AlexNet, VGG.