Chapter 17 Lift Performance With Learning Rate Schedules Flashcards

Question 1

Q

What benefits does adapting the learning rate have? P 118

Answer

A

Adapting the learning rate for your stochastic gradient descent optimization procedure can increase performance and reduce training time.

Question 2

Q

Two popular and easy to use learning rate schedules are ____ P 118

Answer

A

Decrease the learning rate gradually based on the epoch. (Time-based learning rate schedule)
Decrease the learning rate using punctuated large drops at specific epochs. (Drop-Based Learning Rate Schedule)

Question 3

Q

Keras has a time-based learning rate schedule built in. The stochastic gradient descent optimization algorithm implementation in the SGD class has an argument called____. This argument is used in the time-based learning rate decay schedule equation which is:____
|P 119

Answer

A

decay
LearningRate = LearningRate * 1 /(1 + decay * epoch)

Question 4

Q

What is “nesterov”parameter in SGD class in Keras? External

Answer

A

nesterov: boolean. Whether to apply Nesterov momentum. Defaults to False.

SGD

Question 5

Q

In Keras Metric values are displayed during____and logged to the ____ object returned by it. They are also returned by ____. External

Keras Metric Doc

Answer

A

fit() , History, model.evaluate()

Keras Metric Doc

Question 6

Q

It can be a good idea to use momentum when using an adaptive learning rate. True/False P 120

Answer

A

True

For example, we may have an initial learning rate of 0.1 and drop it by a factor of 0.5 every 10 epochs. The first 10 epochs of training would use a value of 0.1, in the next 10 epochs a learning rate of 0.05 would be used, and so on.

Question 7

Q

A popular learning rate schedule used with deep learning models is to systematically drop the learning rate at specific times during training. Often this method is implemented by dropping the learning rate by ____ every fixed number of epochs. P 122

Question 8

Q

We can implement Drop-Based Learning Rate Schedule in Keras using the ____ callback, when fitting the model. P 122

Answer

A

LearningRateScheduler

Question 9

Q

The time-based learning rate decay function is used when setting up____, as the ____ parameter. The drop-based learning rate decay function is used when ____ and as the ____ parameter. External

Answer

A

SGD, “decay”, fitting the model, “callbacks”

Question 10

Q

The LearningRateScheduler callback allows us to define a function to call that takes the ____ as an argument and returns the ____ to use in stochastic gradient descent. When used, the learning rate specified by stochastic gradient descent is ____. P 122

Answer

A

epoch number, learning rate, ignored

Question 11

Q

What’s the function, used for step-based learning rate decay? What is its name conventionally? P 122

Answer

A

LearningRate = InitialLearningRate * DropRate ^floor( (1+Epoch)/ EpochDrop )
Step_decay()
Where InitialLearningRate is the learning rate at the beginning of the run, EpochDrop is how often the learning rate is dropped in epochs and DropRate is how much to drop the learning rate each time it is dropped.

Question 12

Q

What does the below code do? P 123

# learning rate schedule
def step_decay(epoch):
  initial_lrate = 0.1
  drop = 0.5
  epochs_drop = 10.0
  lrate =
initial_lrate×math.pow(drop, math.floor((1+epoch)/epochs_drop))
  return lrate
# create model
model = Sequential()
model.add(Dense(34, input_dim=34, kernel_initializer= "normal" , activation= "relu" ))
model.add(Dense(1, kernel_initializer= "normal" , activation= "sigmoid" ))
# Compile model
epochs = 50
learning_rate = 0.1
momentum = 0.9
sgd = SGD(learning_rate=learning_rate, momentum=momentum, decay=0, nesterov=False)
model.compile(loss= "binary_crossentropy" , optimizer=sgd, metrics=[ "accuracy" ])
# learning schedule callback
lrate = LearningRateScheduler(step_decay)
callbacks_list = [lrate]
# Fit the model
hist=model.fit(X, Y, validation_split=0.33,callbacks=callbacks_list, epochs=epochs, batch_size=28, verbose=1)

Answer

A

It uses the drop-based learning rate decay during the training of the model.

Note that it ignores the learning rate in SGD

Question 13

Q

What does the below code do? P 122

create model
model = Sequential()
model.add(Dense(34, input_dim=34, kernel_initializer= "normal" , activation= "relu" ))
model.add(Dense(1, kernel_initializer= "normal" , activation= "sigmoid" ))
# Compile model
epochs = 50
learning_rate = 0.1
decay_rate = learning_rate / epochs
momentum = 0.8
sgd = SGD(learning_rate=learning_rate, momentum=0, decay=decay_rate, nesterov=False)
model.compile(loss= "binary_crossentropy" , optimizer=sgd, metrics=[ "accuracy" ])
# Fit the model
hist=model.fit(X, Y, validation_split=0.33, epochs=epochs, batch_size=28, verbose=1)

Answer

A

Uses time-based learning rate decay during the training

Question 14

Q

Why is it a good idea to Increase the initial learning rate when using learning rate schedules? P 124

Answer

A

Because the learning rate will decrease, start with a larger value to decrease from. A larger learning rate will result in a lot larger changes to the weights, at least in the beginning, allowing you to benefit from fine tuning later.

Question 15

Q

Why is it a good idea to use large momentum when using learning rate schedules? P 124

Answer

A

Using a larger momentum value will help the optimization algorithm to continue to make updates in the right direction when your learning rate shrinks to small values

.9 is the usual momentum, I used .99 in the example and it adversely affected the performance

Question 16

Q

Why is it a good idea to experiment with different schedules when using learning rate schedules? P 124

Answer

A

It will not be clear which learning rate schedule
to use so try a few with different configuration options and see what works best on your problem. Also try schedules that change exponentially and even schedules that respond to the accuracy of your model on the training or test datasets.

Question 17

Q

The “decay” parameter of sgd optimizer in Keras is now deprecated and is replaced by Learning Rate Schedule APIs, what’s the difference between the old and the new version? External

Answer

A

In the book, we used the “decay” parameter for time-based LR decay, in this case, the “learning_rate” parameter was ignored and it didn’t matter what number was allocated to it.

Now, the “learning_rate” parameter is set to Learning Rate Schedule APIs, and the “decay” parameter is deprecated (it still works, won’t return an error, but isn’t recommended)

Keras Learning Rate Schedule APIs

Github doc