Neural Networks Flashcards

1
Q

Why artificial neural networks?

A
  • biological inspiration, human brain
  • to reproduce the brain
    • goal is understand how it works
    • reproduce phenomena and biological data
  • understand the e general computational principles used by the brain
    • reproduces some of its functions
    • focus of ML
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Different models and learning?

A
  • supevised learning
    • classification, regression, time series
  • unsupervised learning
    • clustering, data mining, self-organized maps
  • different neural network models
    • different computational/learning needs
    • network topology
    • function computed by a single neuron
    • training algorithm
    • how training proceeds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When to use a Neural Network?

A
  • high-dimensional input, discrete/real valued
  • discrete/real valued output
  • data noisy
  • target function unknown
  • long learning times acceptable and quick evaluation of learned function
  • final solution does not need to be understood by humans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Single neuron - Perceptron

A
  • weighted sum of inputs and step function
  • any Boolean function can be implemented as a combination of
    Perceptrons
    • not single perceptron
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Perceptron learning algorithm

A
  • linearly separable samples in R^n
    • algorithm terminates in a finite number of steps
  • initialize weights randomly
    • n >=0 learning rate
    • target -1,1
    • (x, t) target
  • repeat
    • select randomly one of the training samples
    • if output = sign(w*x)!=target
      • w = w+n(t-o)x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Learning rate

A
  • step of learning
  • small value makes learning more stable
    • prevents the weight vector to undergo too ”sharp” changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is perceptron derivable?

A
  • No, the hard threshold is not derivable

* To make perceptron derivable a sigmoid instead of the step function is necessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multilayer Neural Networks

A
  • composed from several connected units
    • compute non-linear function
  • different types
    • input, input variables
    • output, output variables
    • hidden, codify correlations among input and output variables
    • weights define on units’ connections
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Delta rule?

A
  • weight update rule (different from Perceptron rule)
    • allows to obtain a best-fit solution approximating the target
  • exploits gradient descent to explore the hypothesis space
    • minimize error function
    • no hard threshold
  • start from a random w and update it in the opposite direction of the gradient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the gradient descent algorithm?

A
  • the weights are initialized with random values
  • until convergence:
    • for each sample in the dataset
      • we compute the output feeding the input to the neuron (o=w*x)
      • we calculate and accumulate the update for each weight n(t-o)x with respect to the taget
    • we update the weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Differences between batch, stochastic and mini-batch gradient descent?

A
  • batch
    • whole dataset
    • computationally efficient
    • gradient more stable
    • performance after long time
    • costly memory-wise (all training samples)
  • stochastic
    • each samples
    • immediate report on the performance
    • expensive
    • gradient could be noisy
  • mini-batch
    • subset of samples
    • man-in-the-middle, tries to keep their advantages while reducing their disadvantages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Backprop algorithm for multilayer perceptron

A
  • the weights are initialized with random values
  • until convergence:
    • for each sample in the dataset
      • we compute the vectors of hidden and outputs units
      • we calculate and accumulate the update for each i-h weight with respect to the target
      • we calculate and accumulate the update for each h-o weight with respect to the target
      • we update the weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Training problems in multi-layer networks

A
  • choice of net typology determines the hypothesis space
    • number of hidden units determines the complexity of the hypothesis space
  • choice of the descent step (learning rate) can be crucial for the convergence
  • training is generally slow
    • output computation is fast
  • lot of local minima could be present
    • difficult arriving at a global one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How could one try to avoid local minima?

A
  • momentum -> term to weight update that imposes a form of inertia on the system
  • stochastic training -> noise can help escape local minima
  • multiple NN training -> same data, different initializations, most performing one is selected (validation). Or ensemble of NN, prediction is average of individual prediction (weighted)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly