Neural Networks Flashcards

Question 1

Q

Why artificial neural networks?

Answer

A

biological inspiration, human brain
to reproduce the brain
- goal is understand how it works
- reproduce phenomena and biological data
understand the e general computational principles used by the brain
- reproduces some of its functions
- focus of ML

Question 2

Q

Different models and learning?

Answer

A

supevised learning
- classification, regression, time series
unsupervised learning
- clustering, data mining, self-organized maps
different neural network models
- different computational/learning needs
- network topology
- function computed by a single neuron
- training algorithm
- how training proceeds

Question 3

Q

When to use a Neural Network?

Answer

A

high-dimensional input, discrete/real valued
discrete/real valued output
data noisy
target function unknown
long learning times acceptable and quick evaluation of learned function
final solution does not need to be understood by humans

Question 4

Q

Single neuron - Perceptron

Answer

A

weighted sum of inputs and step function
any Boolean function can be implemented as a combination of
Perceptrons
- not single perceptron

Question 5

Q

Perceptron learning algorithm

Answer

A

linearly separable samples in R^n
- algorithm terminates in a finite number of steps
initialize weights randomly
- n >=0 learning rate
- target -1,1
- (x, t) target
repeat
- select randomly one of the training samples
- if output = sign(w*x)!=target
  - w = w+n(t-o)x

Question 6

Q

Learning rate

Answer

A

step of learning
small value makes learning more stable
- prevents the weight vector to undergo too ”sharp” changes

Question 7

Q

Is perceptron derivable?

Answer

A

No, the hard threshold is not derivable

* To make perceptron derivable a sigmoid instead of the step function is necessary

Question 8

Q

Multilayer Neural Networks

Answer

A

composed from several connected units
- compute non-linear function
different types
- input, input variables
- output, output variables
- hidden, codify correlations among input and output variables
- weights define on units’ connections

Question 9

Q

What is the Delta rule?

Answer

A

weight update rule (different from Perceptron rule)
- allows to obtain a best-fit solution approximating the target
exploits gradient descent to explore the hypothesis space
- minimize error function
- no hard threshold
start from a random w and update it in the opposite direction of the gradient

Question 10

Q

How does the gradient descent algorithm?

Answer

A

the weights are initialized with random values
until convergence:
- for each sample in the dataset
  - we compute the output feeding the input to the neuron (o=w*x)
  - we calculate and accumulate the update for each weight n(t-o)x with respect to the taget
- we update the weights

Question 11

Q

Differences between batch, stochastic and mini-batch gradient descent?

Answer

A

batch
- whole dataset
- computationally efficient
- gradient more stable
- performance after long time
- costly memory-wise (all training samples)
stochastic
- each samples
- immediate report on the performance
- expensive
- gradient could be noisy
mini-batch
- subset of samples
- man-in-the-middle, tries to keep their advantages while reducing their disadvantages

Question 12

Q

Backprop algorithm for multilayer perceptron

Answer

A

the weights are initialized with random values
until convergence:
- for each sample in the dataset
  - we compute the vectors of hidden and outputs units
  - we calculate and accumulate the update for each i-h weight with respect to the target
  - we calculate and accumulate the update for each h-o weight with respect to the target
  - we update the weights

Question 13

Q

Training problems in multi-layer networks

Answer

A

choice of net typology determines the hypothesis space
- number of hidden units determines the complexity of the hypothesis space
choice of the descent step (learning rate) can be crucial for the convergence
training is generally slow
- output computation is fast
lot of local minima could be present
- difficult arriving at a global one

Question 14

Q

How could one try to avoid local minima?

Answer

A

momentum -> term to weight update that imposes a form of inertia on the system
stochastic training -> noise can help escape local minima
multiple NN training -> same data, different initializations, most performing one is selected (validation). Or ensemble of NN, prediction is average of individual prediction (weighted)

Neural Networks Flashcards

(14 cards)