Basic Of Learning Flashcards

1
Q

Perceptron

A

Perceptron, i.e. a FFNN composed of MCP neurons with step or signum activation function
and threshold T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the correspondent of biological stimulus for a perceptrone

A

The input pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For what task a perceptron can be used?

A

Classification of patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the threshold managed together with the weights?

A

The threshold is considered as an additional weight of the neuron with a virtual constant input equal to -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision boundary: what is and orientation

A

Place of the values for which the action potential is 0. It is orthogonal to the weight vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Rosenblatt perceptron learning rule

A

incremental procedure, we start from an initial weight vector
1)the weight vector is iteratively updated using online strategy
2)each pattern k in the training set contributes to the weight increment vector by means of the error signal
3) one iteration of the iterative procedure requires the evaluation of all R patterns
FORMULA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When does the rosenblatt perceptron learning rule correct the weight vector?

A

If and only if a misclassification occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What kind of problems can the perceptron solve?

A

Only the linearly separable ones: it must exist the hyper-plane that completely separates the patterns in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to extend basic perceptron?

A
  1. continuous output
  2. non linear continuous activation function
  3. Smooth transition near to 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to pass from continuous output to categorical?

A

Softmax network (or manual thresholding in simple cases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Error function with c1 activation functions

A

FORMULA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why using square in error function?

A

It makes the error positive and penalizes large errors more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gradient descent

A

It is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the learning rate?

A

It modulates the amplitude of the gradient vector in gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Delta rule update formula

A

dw_ij=(t_i-u_i)f’(P_i)x_j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Derivative of logsig

A

f(1-f)

17
Q

derivative of tanh

A

1-f^2

18
Q

What does it mean to “reinforce the learning”?

A

Feeding the network several times with the same training set of patterns

19
Q

Problem of the local minima, how to solve

A

If we start nearby a local minimum we might end up there instead than the global minimum. It depend on the starting guess so starting with a range of different initial weights sets increases our chances of finding the global minimum

20
Q

What is an important problem of linear activation functions ( a part from linearity)?

A

Suitable for continuous output but may lead to parameter unbound

21
Q

How to initialize weights? Why?

A

Randomly, in a small range about zero because sigmoid function can easily saturate for great values of the weights

22
Q

Type of weight update?

A

On-line, BATCH, mini-batch

23
Q

on-line updating

A

each pattern error contributes sequentially to the weight updating.
The search in the weight is more stochastic avoiding local minima

24
Q

batch updating

A

implies that all the pattern errors are cumulated before updating the other weights.
Small pattern errors can be smoothed out so it is less sensitive

25
Q

mini-batch updating

A

Use of a subset S of the overall training dataset

26
Q

possible stop criteria

A

1)maximum number of iterations
2)euclidean norm of the gradient vector less than a predefined threshold
3)error function less than a predefined threshold
4)hybrid criterion

27
Q

Regularization factor

A

norm(w)^2: keeps weights small as much as possible. scaled by a regularization rate and summed to the error

28
Q

heuristic rules for training data

A

1)training data should be representative for the target task
2)avoiding many examples of one type at the expense of another
3)if one class of pattern is easy to learn, having large numbers of patterns from that class in the training set will only slow down the over-all learning process
4)rescale input values(zero mean and std normalization)

29
Q

How to prevent under-fitting

A

the network must have a sufficient number of hidden units. Use convergence threshold

30
Q

How to prevent over-fitting

A

Avoid too much layers and units
Additional noise superimposed to the training patterns
The training can be stopped before convergence