Chapter 6 Crash Course In Multilayer Perceptrons Flashcards

1
Q

What’s a Perceptron? P 48

A

A Perceptron is a single neuron model that was a precursor to larger neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are hidden layers, and why are they called that? P 50

A

Layers after the input layer are called hidden layers because they are not directly exposed to the input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the choice of activation function in the output layer depend on? Explain using examples P 50

A

The choice of activation function in the output layer is strongly constrained by the type of problem that you are modeling.
For example: A regression problem may have a single output neuron and the neuron may have no activation function.
A binary classification problem may have a single output neuron and use a sigmoid activation function to output a value between 0 and 1 to represent the probability of predicting a value for the primary class.
A multiclass classification problem may have multiple neurons in the output layer, one for each class. In this case a softmax activation function may be used to output a probability of the network predicting each of the class values. Selecting the output with the highest probability can be used to produce a crisp class classification value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the output layer? P 50

A

The final hidden layer is called the output layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Neural networks require the input to be scaled in a consistent way. What are the ways of doing that? P 51

A

You can rescale it to the range between 0 and 1 called normalization. Another popular technique is to standardize it so that the distribution of each column has the mean of zero and the standard deviation of 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The classical and still preferred training algorithm for neural networks is called …. P 51

A

Stochastic gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In Stochastic gradient descent one row of data is exposed to the network at a time as input. True/False P 51

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s One round of updating the network for the entire training dataset called? P 51

A

An epoch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The weights in the network can be updated from the errors calculated for EACH training example and this is called … P 51

A

Online learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which type of weight updating methods cause fast but also chaotic changes to the network? P 51

A

Online learning (updating after each example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When are the weights updated in batch learning? P 52

A

The errors can be saved up across all of the training examples and the network can be updated at the end. This is called batch learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Batch learning is more stable than online learning. True/False P 52

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Because datasets are so large and because of computational efficiencies the size of the batch is often reduced to a smaller number. True/False P 52

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is another name for learning rate and what does it do? P 52

A

It controls the step or change made to network weights for a given error. It is also called the step size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is momentum? External

A

Momentum is a technique to prevent sensitive movement. When the gradient gets computed every iteration, it can have totally different direction and the steps make a zigzag path, which makes training very slow.
To prevent this from happening, momentum kind of stabilizes this movement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is learning rate decay? P 52

A

Learning Rate Decay is used to decrease the learning rate over epochs to allow the network to make large changes to the weights at the beginning and smaller fine tuning changes later in the training schedule