Basic Of Learning Flashcards

Question 1

Q

Perceptron

Answer

A

Perceptron, i.e. a FFNN composed of MCP neurons with step or signum activation function
and threshold T

Question 2

Q

What is the correspondent of biological stimulus for a perceptrone

Answer

A

The input pattern

Question 3

Q

For what task a perceptron can be used?

Answer

A

Classification of patterns

Question 4

Q

How is the threshold managed together with the weights?

Answer

A

The threshold is considered as an additional weight of the neuron with a virtual constant input equal to -1

Question 5

Q

Decision boundary: what is and orientation

Answer

A

Place of the values for which the action potential is 0. It is orthogonal to the weight vector

Question 6

Q

Rosenblatt perceptron learning rule

Answer

A

incremental procedure, we start from an initial weight vector
1)the weight vector is iteratively updated using online strategy
2)each pattern k in the training set contributes to the weight increment vector by means of the error signal
3) one iteration of the iterative procedure requires the evaluation of all R patterns
FORMULA

Question 7

Q

When does the rosenblatt perceptron learning rule correct the weight vector?

Answer

A

If and only if a misclassification occurs

Question 8

Q

What kind of problems can the perceptron solve?

Answer

A

Only the linearly separable ones: it must exist the hyper-plane that completely separates the patterns in between

Question 9

Q

How to extend basic perceptron?

Answer

A

continuous output
non linear continuous activation function
Smooth transition near to 0

Question 10

Q

How to pass from continuous output to categorical?

Answer

A

Softmax network (or manual thresholding in simple cases)

Question 11

Q

Error function with c1 activation functions

Question 12

Q

Why using square in error function?

Answer

A

It makes the error positive and penalizes large errors more

Question 13

Q

Gradient descent

Answer

A

It is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point

Question 14

Q

what is the learning rate?

Answer

A

It modulates the amplitude of the gradient vector in gradient descent

Question 15

Q

Delta rule update formula

Answer

A

dw_ij=(t_i-u_i)f’(P_i)x_j

Question 16

Q

Derivative of logsig

Question 17

Q

derivative of tanh

Question 18

Q

What does it mean to “reinforce the learning”?

Answer

A

Feeding the network several times with the same training set of patterns

Question 19

Q

Problem of the local minima, how to solve

Answer

A

If we start nearby a local minimum we might end up there instead than the global minimum. It depend on the starting guess so starting with a range of different initial weights sets increases our chances of finding the global minimum

Question 20

Q

What is an important problem of linear activation functions ( a part from linearity)?

Answer

A

Suitable for continuous output but may lead to parameter unbound

Question 21

Q

How to initialize weights? Why?

Answer

A

Randomly, in a small range about zero because sigmoid function can easily saturate for great values of the weights

Question 22

Q

Type of weight update?

Answer

A

On-line, BATCH, mini-batch

Question 23

Q

on-line updating

Answer

A

each pattern error contributes sequentially to the weight updating.
The search in the weight is more stochastic avoiding local minima

Question 24

Q

batch updating

Answer

A

implies that all the pattern errors are cumulated before updating the other weights.
Small pattern errors can be smoothed out so it is less sensitive

Question 25

Q

mini-batch updating

Answer

A

Use of a subset S of the overall training dataset

Question 26

Q

possible stop criteria

Answer

A

1)maximum number of iterations
2)euclidean norm of the gradient vector less than a predefined threshold
3)error function less than a predefined threshold
4)hybrid criterion

Question 27

Q

Regularization factor

Answer

A

norm(w)^2: keeps weights small as much as possible. scaled by a regularization rate and summed to the error

Question 28

Q

heuristic rules for training data

Answer

A

1)training data should be representative for the target task
2)avoiding many examples of one type at the expense of another
3)if one class of pattern is easy to learn, having large numbers of patterns from that class in the training set will only slow down the over-all learning process
4)rescale input values(zero mean and std normalization)

Question 29

Q

How to prevent under-fitting

Answer

A

the network must have a sufficient number of hidden units. Use convergence threshold

Question 30

Q

How to prevent over-fitting

Answer

A

Avoid too much layers and units
Additional noise superimposed to the training patterns
The training can be stopped before convergence