Multilayer Perceptron Flashcards

1
Q

What is the shape of the decision boundary for a single layer perceptron?

A

Linear decision boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a limitation of single layer perceptron that is overcome by using a multi-layer perceptron?

A

The MLP can solve the XOR problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a MLP trained?

A

Using backpropagation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Differentiate between single layer and multi layer perceptron.

A

A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer). While a single layer perceptron can only learn linear functions, a multi layer perceptron can also learn non - linear functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the weight update rule for the gradient descent method?

A

“walk” in the direction yielding the maximum decrease of the network error E.

This direction is the opposite of the gradient of E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Delta rule?

A

The delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It is a special case of the more general backpropagation algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Briefly describe the working of Backpropagation

A

– Computing the output of the network and the corresponding error,

– Computing the contribution of each weight to the error,

– Adjusting the weights accordingly (to the contribution to error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three update strategies for backpropagation?

A

Full Batch mode (all inputs at once; conceptually “correct”) Weights are updated after all the inputs are processed

Batch mode (a small, random sample of inputs; “approximate”) Weights are updated after all a small random sample of inputs is processed (Stochastic Gradient Descent)

On-line mode (one input at a time)
Weights are updated after processing single inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the advantage of using Stochastic Gradient Descent?

A
  • Additional randomness helps to avoid local minima
  • Huge savings of the CPU-time
  • Easy to execute on GPU cards
  • “Approximated gradient” works almost the same as “exact gradient” (almost the same convergence rate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give examples of stopping criterion and EARLY stopping criterion

A

– total mean squared error change: Backprop
is considered to have converged when the
absolute rate of change in the average squared
error per epoch is sufficiently small

– generalization based criterion:
After each epoch the NN is tested for
generalization using a different set of examples
(validation set). If the generalization performance
is adequate then stop. (Early Stopping)

Early Stopping:
stop training as soon at the error on
the validation set increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens when there are too many or too few hidden units? How do you solve this?

A
  • Too few hidden units may prevent the network from learning adequately the data and learning the concept.
  • Too many hidden units leads to overfitting.
  • We can solve this by deciding the optimum number of hidden units, using a cross-validation scheme
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of activation and error functions would u use for:

  1. Regression problems
  2. Binary Classification problems
  3. Multiclass classification
A
  1. For regression problems use linear outputs
    and the Sum-Squared-Error function
  2. For binary classification problems use logistic output unit and minimize the cross-entropy function!
  3. For multi-class classification problems use softmax activation function and minimize the cross-entropy function!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly