Multilayer Perceptron Flashcards

Question 1

Q

What is the shape of the decision boundary for a single layer perceptron?

Answer

A

Linear decision boundary

Question 2

Q

What is a limitation of single layer perceptron that is overcome by using a multi-layer perceptron?

Answer

A

The MLP can solve the XOR problem

Question 3

Q

How is a MLP trained?

Answer

A

Using backpropagation

Question 4

Q

Differentiate between single layer and multi layer perceptron.

Answer

A

A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer). While a single layer perceptron can only learn linear functions, a multi layer perceptron can also learn non - linear functions.

Question 5

Q

What is the weight update rule for the gradient descent method?

Answer

A

“walk” in the direction yielding the maximum decrease of the network error E.

This direction is the opposite of the gradient of E.

Question 6

Q

What is the Delta rule?

Answer

A

The delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It is a special case of the more general backpropagation algorithm.

Question 7

Q

Briefly describe the working of Backpropagation

Answer

A

– Computing the output of the network and the corresponding error,

– Computing the contribution of each weight to the error,

– Adjusting the weights accordingly (to the contribution to error).

Question 8

Q

What are the three update strategies for backpropagation?

Answer

A

Full Batch mode (all inputs at once; conceptually “correct”) Weights are updated after all the inputs are processed

Batch mode (a small, random sample of inputs; “approximate”) Weights are updated after all a small random sample of inputs is processed (Stochastic Gradient Descent)

On-line mode (one input at a time)
Weights are updated after processing single inputs

Question 9

Q

What is the advantage of using Stochastic Gradient Descent?

Answer

A

Additional randomness helps to avoid local minima
Huge savings of the CPU-time
Easy to execute on GPU cards
“Approximated gradient” works almost the same as “exact gradient” (almost the same convergence rate)

Question 10

Q

Give examples of stopping criterion and EARLY stopping criterion

Answer

A

– total mean squared error change: Backprop
is considered to have converged when the
absolute rate of change in the average squared
error per epoch is sufficiently small

– generalization based criterion:
After each epoch the NN is tested for
generalization using a different set of examples
(validation set). If the generalization performance
is adequate then stop. (Early Stopping)

Early Stopping:
stop training as soon at the error on
the validation set increases

Question 11

Q

What happens when there are too many or too few hidden units? How do you solve this?

Answer

A

Too few hidden units may prevent the network from learning adequately the data and learning the concept.
Too many hidden units leads to overfitting.
We can solve this by deciding the optimum number of hidden units, using a cross-validation scheme

Question 12

Q

What kind of activation and error functions would u use for:

Regression problems
Binary Classification problems
Multiclass classification

Answer

A

For regression problems use linear outputs
and the Sum-Squared-Error function
For binary classification problems use logistic output unit and minimize the cross-entropy function!
For multi-class classification problems use softmax activation function and minimize the cross-entropy function!

Multilayer Perceptron Flashcards

(12 cards)