Chapter 6 Crash Course In Multilayer Perceptrons Flashcards
What’s a Perceptron? P 48
A Perceptron is a single neuron model that was a precursor to larger neural networks.
What are hidden layers, and why are they called that? P 50
Layers after the input layer are called hidden layers because they are not directly exposed to the input.
What does the choice of activation function in the output layer depend on? Explain using examples P 50
The choice of activation function in the output layer is strongly constrained by the type of problem that you are modeling.
For example: A regression problem may have a single output neuron and the neuron may have no activation function.
A binary classification problem may have a single output neuron and use a sigmoid activation function to output a value between 0 and 1 to represent the probability of predicting a value for the primary class.
A multiclass classification problem may have multiple neurons in the output layer, one for each class. In this case a softmax activation function may be used to output a probability of the network predicting each of the class values. Selecting the output with the highest probability can be used to produce a crisp class classification value.
What’s the output layer? P 50
The final hidden layer is called the output layer.
Neural networks require the input to be scaled in a consistent way. What are the ways of doing that? P 51
You can rescale it to the range between 0 and 1 called normalization. Another popular technique is to standardize it so that the distribution of each column has the mean of zero and the standard deviation of 1.
The classical and still preferred training algorithm for neural networks is called …. P 51
Stochastic gradient descent.
In Stochastic gradient descent one row of data is exposed to the network at a time as input. True/False P 51
True
What’s One round of updating the network for the entire training dataset called? P 51
An epoch.
The weights in the network can be updated from the errors calculated for EACH training example and this is called … P 51
Online learning
Which type of weight updating methods cause fast but also chaotic changes to the network? P 51
Online learning (updating after each example)
When are the weights updated in batch learning? P 52
The errors can be saved up across all of the training examples and the network can be updated at the end. This is called batch learning.
Batch learning is more stable than online learning. True/False P 52
True
Because datasets are so large and because of computational efficiencies the size of the batch is often reduced to a smaller number. True/False P 52
True
What is another name for learning rate and what does it do? P 52
It controls the step or change made to network weights for a given error. It is also called the step size.
What is momentum? External
Momentum is a technique to prevent sensitive movement. When the gradient gets computed every iteration, it can have totally different direction and the steps make a zigzag path, which makes training very slow.
To prevent this from happening, momentum kind of stabilizes this movement.