ANN Flashcards

1
Q

What is the difference between supervised ANNs and unsupervised ones?

A

Supervised = trained with labelled inputs - weights attempt to fit correct output - inductive

Unsupervised = learning done iteratively to satisfy some learning rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the types of ANN topologies?

A

Feed forward: all weights are directed forwards

Recurrent: Weights can point backwards and provide immediate feedback.

[Hidden] Single / Multiple layered

Partially / Fully connected: describes connectivity between nodes across layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the different activation functions? Explain them.

A

Linear: output = c*input
Threshold: If weighted sum > 0, output = 1; else -1;
Sigmoid: Continuous threshold. Bound output between 1 and -1;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When would you use ANNs?

A

When:

  1. Input data is high dimension + continuous
  2. Data is noisy
  3. Long training times are OK
  4. When you have enough labelled training data
  5. When target function is unknown
  6. When explaining the result is not important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 learning rules? Describe them.

A
  1. Hebb’s rule: If 2 connected neurons are simultaneously on => weight( new ) = weight (old) + x1x2
2. Perceptron rule: weight (new) = weight(old) + σ(t-o)x
t = target output (0,1)
o = actual output (0,1)
x = neuron input
σ = learning rate
*** threshold activation
  1. Delta rule: weight (new) = weight(old) + σ(t-o)x
    * ** linear/continuous activation
    * ** outputs can be anything
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are perceptrons?

A

Linear classifiers - a hyper plane ( vector space of one dimension lower that divides a space classifying data points )

If weighted sum + bias > the boundary, perceptron outputs a 1

Else, 0.

*** outputs can only be 1 or 0/-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the point of the bias term?

A

Speeds up learning by shifting the hyper plane from the origin - do not always need them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some of the properties of perceptrons?

A

Can classify inputs as 0 or 1 => can simulate any logic gate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does perceptron learning work?

A

Using the perceptron learning rule:
For N training examples:
if o /= t:
Update weights according to the learning rule s.t error is minimised
Stop when error is acceptable or i = N
Note: perceptron rules adapts weights only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the fundamental basis of perceptron learning?

A

Error correction learning => adjust weights until o = t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the limitation of single perceptrons? What is the solution?

A

Can only classify linear separable spaces

Multiple Layer perceptrons = interconnected perceptrons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When use the delta learning rule?

A

If the range of values we want to be able to produce is continuous - continuous activation functions

*** t and o do not have to be 1 or 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does delta learning work?

A

The exact same way as perceptron learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DLR vs PLR - differences and similarities

A

Same: both learn through error correction

Different:

PLR can only work with threshold activation functions (0;1)

DLR can work with any differentiable activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a universal function approximator?

A

ANN with at least one hidden layer (1) enough nodes (2) continuous activation function (3) can approximate any continuous non-linear function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of the delta learning rule?

A

Classify non linearly separable problems

17
Q

How do you decide the type of perceptron organisation to use as well as the activation function?

A
Is the problem linearly separable?
No: use MLP
Yes: Is the output Binary?
 No: use single perceptron + continuous 
 activation function
 Yes: use single perceptron + threshold activation 
 function.
18
Q

What is the weight/hypothesis space?

A

Mapping of weights to error - gradient descent aims to find minimums in this topology

19
Q

How do we calculate error for learning by error minimisation for the ∆ rule?

A

E(w) = 1/2Σ(t-o)^2 ==> modified MSE

20
Q

What is gradient descent?

A

Iteratively change weights to reduce MSE

Can implement DLR at a neuron or network level.

Always moves us in the direction of the steepest decrease in slope

21
Q

What is backpropagation?

A

Process of feeding error back through the network to determine by how much we need to adjust weights.

22
Q

What do we need for back-propagation to work?

A
  1. differentiable activation function

2. non linear activation function

23
Q

What is the back propagation algorithm?

A
  1. Forward pass
  2. Calculate ẟi for output neurons
    [ẟi = Oi*(1-Oi) * (Ti-Oi)] - sigmoid
  3. Calculate change in weights to output nodes
    [∆Whi = η * ẟi * xhi] - where xhi is the output from the previous node
  4. Calculate ẟh for hidden neurons
    [ẟh = Oh * (1-Oh) * ΣWhi*ẟi]
  5. Calculate change in weights to hidden nodes
    [∆Whi = η * ẟi * xhi] - where xhi is the output from
  6. Update weights