ANN Lecture 3 - Backpropagation and Gradient Descent Flashcards

1
Q

Error Surface

A

The error surface visualizes the loss as a function dependent on the parameters. The aim is to find
the global minimum of the error surface!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Finding the global minimum of the error surface

A

The error surface is usually not computable.
-> Gradient Descent:
Evaluating the error surface only at one combination of weights and finding out which way is downhill.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Derivative

A

The derivative of a function is a function which describes the slope of the function at every point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Partial Derivative

A

We call a derivative a partial derivative if the function we derive is dependent on more than one variable, but we only derive the function with respect to one derivative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gradient

A

The gradient is the vector with all partial derivatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Jacobian Matrix

A

The jacobian matrix is the generalization to the case of a function that maps multiple variables onto multiple dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gradient Descent

A

If we can calculate the gradient, we can just walk a bit in the opposite direction (because the gradient goes uphill). This will step for step lead us to a minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gradient Descent Rule

A

Last Layer:
Gradient =
-(Target_k+1 - Output_k+1) * Sigma’(Drive_k+1) * Activation_k

Otherwise:
Gradient =
Error_k+1 * Weights_k+1 * Sigma’(Drive_k) * Activation_k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gradient Descent Parameter Update

A

New Parameters =

Old Parameters - Learning Rate * Gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Full Batch Gradient Descent

A

New Parameters =

Old Parameters - Learning Rate * 1/N * Sum of all Gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Non Convex Error Surface

A
  • A non-convex function has multiple so called critical
    points
  • Optimization can get stuck at local minimum or saddle point because the slope is zero
    Solution:
    -> Mini Batch Gradient Descent
    -> Stochastic Gradient Descent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Full Batch Gradient Descent (Pros & Cons)

A

Always minimizing the same error surface
+ Gradients show a clear direction
+ Guaranteed to converge to a solution
- If a local minimum is found it gets stuck
- Slow or even infeasible because of huge data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stochastic/Mini Batch Gradient Descent (Pros & Cons)

A

The error surface you minimize changes for each batch
- Gradient can differ heavily for each update
- Not guaranteed to converge to a solution
+ Has a chance of passing the local minimum of the error surface
+ Faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gradient Descent Algorithm

A
  1. Initialize parameters
  2. Chunk your data into batches of decided batch size
  3. For all batches:
    a. Feed the data of one batch through the network
    b. Calculate the gradient for the resulting loss
    function
    c. Update the parameters
  4. After you’re done with all batches go back to 2.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Training Step &Epoche

A

Training Step:
Update of parameters for one Batch
Epoche:
Update of parameters for all Batches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Momentum

A

Adding the update from the last step to the current one.

-> Optimization can overcome local minima

17
Q

Training & Validation Data

A

Training Data:
Used for training
Validation Data:
Used to validate how well the model generalizes (usually after each epoch); network is not trained