Neural Networks Flashcards

1
Q

What activation function did the first neural network use?

A

Heaviside step function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What method is used for optimizing multi layer neural networks

A

Backpropagation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some improvements to neural networks the past 30 years

A

1 ) Better hardware

2) Deeper networks
3) Larger datasetets
4) Other changes, better activation funcitons, different layers…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we adapt gradient descent to work with very large training sets?

A

Stochastic gradient descent. (Use a random batch from the training data and update the weights using this batch).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does training step influence the error for:

1) Very high learning rate
2) high learning rate
3) low learning rate

A

1) The error will increase rapidly
2) The error will decrease rapidly in the begining and then “flatten” out, never reaching the optimum
3) The error will decrease slowly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the vanishing gradient problem?

A

Activation functions like the sigmoid saturate for large/small values of x, meaning the gradient is close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the exploding gradient problem?

A

The gradient suddently increases a lot. The gradient descent algorithm can “jump” far away from the optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can we adapt gradient descent to fix the vanishing and exploding gradient problems?

A

1) We can use adaptiv stepsizes.

2) We can Clip the gradient using thresholding or L2 norm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main advantage of the relu over the sigmoid activation function?

A

It doesn’t saturate for high values of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we deal with relu saturation for input values below 0?

A

We can use leaky relu, PRelu, or Elu instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What types of problems is the mean squared loss function most commonly used for?

A

Regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What ouput function do we usually use for the binary classification problem?

A

Sigmoid (Or softmax with 2 outputs, one for “true” one for “false”…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What output function do we usually use for Multi-class classification problems?

A

Softmax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What loss function do we usually use for classification?

A

Cross entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data augmentation?

A

We increase the training set by adding distorted, squeezed, tilted… versions of the original dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What assumption do we make when using a CNN?

A

nearby features (for example pixels) are dependent on eachother.

17
Q

How we make sure that the output of a convolutional layer is the same size as the input.

A

Zero-pad the input.

18
Q

What happens to the output size if have stride=2 in a convolutional layer? (When the input is zero padded..)

A

The output size is 1/4 of the input size.

19
Q

How does the kernel in a convo layer look if we use dilation = 2 and size 3?

A

[X, –, X
–, X, –
X, –, X]

We only “look” at values where the kernel is X.

20
Q

What is pooling in a convolutional network?

A

It is like downsampling. We usually use max-pool meaning that the maximum value in the local region is selected.

21
Q

What is a residual net?

A

A residual net has “skip connections” mainly allowing the gradient “skip” layers on backprop to leviate vanishing gradients.

22
Q

Name some famous neural networks

A

GoogleNet, ResNet, AlexNet, VGG.