C3 Flashcards

Question 1

Q

types of decision regions

Answer

A

network with a single node -> for separation with just one line
one-hidden layer network -> realize a convex region: each hidden node realizes on of the lines bounding the region
two-hidden layer network -> realizes the union of three convex regions

Question 2

Q

how to train multi-layer networks?

Answer

A

replace the sign function by its smooth approximation and use the gradient descent algorithm to find weights that minimize the error

Question 3

Q

weight update rule

Answer

A

gradient descent method: walk in the direction yielding the maximum decrease of the network error E
Δw_ji = −eta * 𝜕E / 𝜕w_ji
w_ji = wji + Δw_ji

Question 4

Q

backpropagation algorithm

Answer

A

the algorithm searches for weight values that minimize the total error of the network

consists of the repeated application of these two phases:
- forward pass: network is activated on one example and the error of each neuron of the output layer is computed, and also the activations of all hidden nodes
- backward pass: network error is used for updating the weights. Starting at the output layer, the error is propagated backwards through the network, layer by layer, with help of the generalized delta rule. Finally all weights are updated.

Question 5

Q

3 update strategies

Answer

A

full batch mode: weights are updated after all the inputs are processed
(mini) batch mode: weights are updated after a small random sample of inputs is processed (Stochastic Gradient Descent)
one-line mode: weights are updated after processing single inputs

Question 6

Q

advantages Stochastic Gradient Descent

Answer

A

additional randomness helps to avoid local minima
huge savings of CPU time
easy to execute on GPU cards

Question 7

Q

stopping criteria

Answer

A

total mean squared error change: backpropagation is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small
generalization based criterion: after each epoch the network is tested for
generalization using a different set of examples (validation set). If the generalization performance is adequate then stop (Early Stopping: avoid overfitting)

Question 8

Q

3 common error functions with corresponding activation functions of the output layer

Answer

A

linear => SSE (sum of squared errors) (regression)
logistic => cross-entropy (binary)
softmax => cross-entropy + softmax (multiclass)

C3 Flashcards

(8 cards)