ANN Lecture 3 - Backpropagation and Gradient Descent Flashcards

Question 1

Q

Error Surface

Answer

A

The error surface visualizes the loss as a function dependent on the parameters. The aim is to find
the global minimum of the error surface!

Question 2

Q

Finding the global minimum of the error surface

Answer

A

The error surface is usually not computable.
-> Gradient Descent:
Evaluating the error surface only at one combination of weights and finding out which way is downhill.

Question 3

Q

Derivative

Answer

A

The derivative of a function is a function which describes the slope of the function at every point.

Question 4

Q

Partial Derivative

Answer

A

We call a derivative a partial derivative if the function we derive is dependent on more than one variable, but we only derive the function with respect to one derivative.

Question 5

Q

Gradient

Answer

A

The gradient is the vector with all partial derivatives.

Question 6

Q

Jacobian Matrix

Answer

A

The jacobian matrix is the generalization to the case of a function that maps multiple variables onto multiple dimensions.

Question 7

Q

Gradient Descent

Answer

A

If we can calculate the gradient, we can just walk a bit in the opposite direction (because the gradient goes uphill). This will step for step lead us to a minimum.

Question 8

Q

Gradient Descent Rule

Answer

A

Last Layer:
Gradient =
-(Target_k+1 - Output_k+1) * Sigma’(Drive_k+1) * Activation_k

Otherwise:
Gradient =
Error_k+1 * Weights_k+1 * Sigma’(Drive_k) * Activation_k-1

Question 9

Q

Gradient Descent Parameter Update

Answer

A

New Parameters =

Old Parameters - Learning Rate * Gradients

Question 10

Q

Full Batch Gradient Descent

Answer

A

New Parameters =

Old Parameters - Learning Rate * 1/N * Sum of all Gradients

Question 11

Q

Non Convex Error Surface

Answer

A

A non-convex function has multiple so called critical
points
Optimization can get stuck at local minimum or saddle point because the slope is zero
Solution:
-> Mini Batch Gradient Descent
-> Stochastic Gradient Descent

Question 12

Q

Full Batch Gradient Descent (Pros & Cons)

Answer

A

Always minimizing the same error surface
+ Gradients show a clear direction
+ Guaranteed to converge to a solution
- If a local minimum is found it gets stuck
- Slow or even infeasible because of huge data set

Question 13

Q

Stochastic/Mini Batch Gradient Descent (Pros & Cons)

Answer

A

The error surface you minimize changes for each batch
- Gradient can differ heavily for each update
- Not guaranteed to converge to a solution
+ Has a chance of passing the local minimum of the error surface
+ Faster

Question 14

Q

Gradient Descent Algorithm

Answer

A

Initialize parameters
Chunk your data into batches of decided batch size
For all batches:
a. Feed the data of one batch through the network
b. Calculate the gradient for the resulting loss
function
c. Update the parameters
After you’re done with all batches go back to 2.

Question 15

Q

Training Step &Epoche

Answer

A

Training Step:
Update of parameters for one Batch
Epoche:
Update of parameters for all Batches

Question 16

Q

Momentum

Answer

Study These Flashcards

A

Adding the update from the last step to the current one.

-> Optimization can overcome local minima

Question 17

Q

Training & Validation Data

Answer

Study These Flashcards

A

Training Data:
Used for training
Validation Data:
Used to validate how well the model generalizes (usually after each epoch); network is not trained

ANN Lecture 3 - Backpropagation and Gradient Descent Flashcards

(17 cards)