5: Learning Flashcards

1
Q

What is NETTalk?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a neural network?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the PDP model?

A

Parallel Distributed Processing ???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is symbolic AI?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do neural networks differ from symbolic AI?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some advantages of symbolic-based AI systems?

A

– A symbolic algorithm can execute anything expressed as following a
sequence of formal rules.
– Large amounts of memorised information can be copied and retrieved
accurately ad infinitum.
– Information processing is relatively fast and highly accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some disadvantages of symbolic-based AI systems?

A

– Maybe not everything can be feasibly expressed as following a sequence of
formal rules. The Chinese Room, various solution searches, meaning.
– Symbolic retrieval of memories can be brittle in being all-or-none.
– Many Real World situations are novel and so require adaptation rather than
fast pre-set actions. Example: every-day situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Of symbolic and neural network AI systems, which is most similar to the organisation of the brain? How? Comment on the duplicity of neuron organisation.

A

Neural networks. They are modelled on the organisation of neurons in the brain and allow for parallel rather than serial organisation. The brain has much simpler and slower individual processing units than computers yet its computation in many areas is better, suggesting the organisation of the brain is better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What constitutes a neural network?

A

A collection of interconnected neurons (or units). Some receive environmental input and some of the others give output to the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are hidden units? What are they aka?

A

Neurons/units in neural networks that have neither input nor output connections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are neurons modelled artificially in neural networks?

A

Binary threshold unit: compute excitation as the weighted sum of inputs, and if excitation is above a certain threshold then consider the neuron “excited” and is activated. When activated, the neuron is in the active state so will output 1 rather than 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for calculating the output of an artificial neuron (BTU)?

A

Outj = g(Σ w(ij) in(i) - Θ); g(x) = 1 where x > 0; g (x) = 0 where x <= 0

g(x) is the activation function, here being a step function (“stepping” at 0)

Θ is the threshold

j is the jth threshold unit (with a unique Θ)

w(ij) is the weight of the ith input to the jth threshold unit

in(i) is the ith input to the jth threshold unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an activation function?

A

A normalising function that defines the output of a neuron given the calculated activation from the inputs to the neuron and their weights as part of a threshold unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name and describe 3 activation functions.

A
  1. Step function
    - output 1 once activation reaches certain number, 0 otherwise
  2. Sigmoid
    - calculate output as part of sigmoid curve
    - g(x) = 1/1+exp(-x)
  3. Rectified Linear Unit
    - output has threshold activation as with step function, then increasing linearly for further increases in activation
    - e.g. with threshold of 0:
    when x <= 0, g(x) = 0
    when x > 0, g(x) = x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Feedforward Architecture?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is supervised learning?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is recurrent architecture?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are network layers in neural networks?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between lateral and feedforward connections?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For a feedforward-based neural network of n layers, how many are hidden?

A

n - 2. Since you can “see” the input and output layers, and all others only connect to each other or input and output layers, so are hidden.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Strictly Layered Architecture?

A

A neural network system in which there are no lateral connections and each neuron may only connect to others in adjacent layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does it mean for a network to be “fully connected”?

A

Each neuron is connected to all others it is able to be connected to; which other neurons each neuron can be connected to is limited by the architecture of the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the concept of Feedforward Pass?

A

The way in which input patterns go through layers in feedforward networks in series - i.e. layer-by-layer, whereas within each layer the signal is propagated in parallel to all neurons in the layer simultaneously (from the previous layer or input).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the concept of generalisation?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does sensibility apply to generalisation?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When is generalisation useful in real-world applications?

A

Where:

  • the relationship between input and output is unknown
  • little available data
  • data contain noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is underfitting?

A

When the model created from an Ai system analysing data is too simple to explain the variance in the data and cannot generalise to fit it correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is overfitting?`

A

When the model created from an Ai system analysing data is too complex in explaining the variance in the data, missing the actual underlying patterns in the data. Here, it pays too much attention to noise and detail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is model complexity?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is pruning?

A

Removing irrelevant neurons that have no effect from a neural network to make it less complex.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is growing?

A

Systematically and repeatedly adding neurons to a neural network by some approach or algorithm while doing so appears to remain to be beneficial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is an error function?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is weight decay?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you implement weight decay to regularise the function?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is validation with respect to neural networks?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you perform validation with neural networks?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is early stopping?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is generalisation error?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does a small generalisation error suggest? Why?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How can you find a good neural generaliser?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is a bias unit? Why are they used?

A

An added input to a neuron fixed at 1 weighted such that it is equal to the threshold of the neuron, Θ. This means that the output of the neuron then only depends on the other “actual” inputs and their weights, allowing adaptation in neurons that can yield greater flexibility in learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you implement an AND gate with a neuron?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How do you implement an OR gate with a neuron?

A

Make Θ = 0.5 and g(x) = 1 when x > 0 and 0 when x <= 0.

Three inputs; first is bias unit fixed to -0.5 weight and +1 value to remove Θ threshold. Make both weights 0.6, i.e. bigger than Θ, so if either or both true neuron gives +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How do you implement a NOT gate with a neuron?

A

Make Θ = anything < 0 and g(x) = 1 when x > 0 and 0 when x <= 0.

Two inputs; first is bias unit fixed to -0.5 weight and +1 value to remove Θ threshold. Make both weights 0.6, i.e. bigger than Θ, so if either or both true neuron gives +1.

45
Q

What is an input space?

A

???

46
Q

What is a hyperplane?

A

???

47
Q

What is linear separation?

A

???

48
Q

What is Excitation Algebra?

A

???

49
Q

What is the Zero Excitation Line?

A

???

50
Q

How many dimensions are in an input space for a neuron with n inputs?

A

The input space here will be n-dimensional.

51
Q

How can you implement XOR with neurons in neural networks?

A

???

52
Q

What does it mean for a neural network to have a 2-1-1 architecture?

A

Its first layer has 2 nodes, its second has 1 node, and its third has 1 node.

53
Q

What is a normal vector?

A

???

54
Q

What is the idea of Universal Computation?

A

That all systems that can represent the logical elements that make up computers (AND, OR, NOT, etc) can form any logical expression that a digital computer can. This is true of neural nets, but they can also do more since output given for every analogue input, not just digital binary ones. Analogue inputs allow for an infinite number of IO mappings in a finite number of weights (and neurons). Feedforward networks are capable of any I/O mapping; recurrent networks of any I/S/O mapping (S being state since context is present in recurrent networks since neurons can connect to themselves).

55
Q

True/false: neurons can’t represent all logical expressions that a digital computer can using a 2-layer architecture

A

False. They can. They can express logical gates like AND, OR, and NOT, and then build up logical expressions from them.

56
Q

True/false: a neural net can’t do more than represent logical expressions.

Why?

A

False. An output is given for every analogue input, not just the digital binary values

57
Q

True/false: In neural networks, an infinite number of I/O associations can be stored using a finite number of weights. Why?

A

True.

??? why

58
Q

What is a Perceptron?

A

A single layer network with step activation (i.e. threshold) units capable of binary response, i.e. 0 or 1.

59
Q

What is the Learning Algorithm for perceptrons?

A

???

60
Q

What is the Convergence theorem for the perceptron learning algorithm?

A

That learning will converge in finite time if a solution exists.

61
Q

What is Backpropagation?

A

???

aka backprop and BP

62
Q

How do you calculate the output for a single output unit?

A

outp = F(inp, w)

inp = input vector for a pattern p, outp = output for inp, w = weight state

63
Q

What is an input vector?

A

???

64
Q

What is a pattern?

A

???

65
Q

What is a weight state?

A

???

66
Q

What is LMS error?

A

???

Least Mean Squared

67
Q

How do you calculate the error for a single output unit?

A

E = (0.5) Σ [(outp – tp) ^ 2]

outp = the output for a pattern p, tp = target for pattern p

The 1/2 is to make differentiation easy btw

68
Q

What is weight space?

A

???

69
Q

What is error-weight space?

A

???

70
Q

What is an error-weight surface?

A

???

71
Q

What is an error-weight surface like near a local minimum?

A

In 2D, a series of elliptical contours representing error values, or in 3D as a 2D elliptical bowl in 3D space.

72
Q

What is Steepest Gradient Descent?

A

???

73
Q

What is hill-climbing?

A

???

74
Q

How does Steepest Gradient Descent work?

A

???

75
Q

Where do gradients arise from?

A

???

76
Q

How do you calculate the gradient between 2 x values?

A

m = ΔE/Δx, since E = y on the graph.

Between 2 x values, x and x + Δx,
m = ΔE/Δx = [E(x + Δx) - E(x)] / Δx

Since E(x) = x ^ 2,
m = ΔE/Δx = [E(x + Δx) - E(x)] / Δx
= [(x + Δx) ^ 2 – (x ^ 2)] / Δx
= [x ^ 2 + 2x * Δx + Δx ^ 2 – x ^ 2] / Δx
= 2x + Δx = 2x
77
Q

How do you calculate the gradient between 2 x values?

A

m = ΔE/Δx, since E = y on the graph.

Between 2 x values, x and x + Δx,
m = ΔE/Δx = [E(x + Δx) - E(x)] / Δx

Since E(x) = x ^ 2,
m = ΔE/Δx = [E(x + Δx) - E(x)] / Δx
= [(x + Δx) ^ 2 – (x ^ 2)] / Δx
= [x ^ 2 + 2x * Δx + Δx ^ 2 – x ^ 2] / Δx
= 2x + Δx = 2x

So the gradient at any point is 2x.

78
Q

Why is the gradient always in the direction of the error?

A

???

79
Q

Why do we move in the direction opposite to the gradient on an error graph thing(?) to correct the error?

A

???

80
Q

What is the learning rate? How is it notated?

A

???

81
Q

What is x equivalent to on the error graph thing(?)?

A

The neural weights

82
Q

How do you find the corrective step from the learning rate and gradient?

A

Δxt = – α (dE/dx)t, x(t+1) = xt + Δxt

83
Q

What is gradient descent?

A

???

84
Q

What is single layer gradient descent?

A

???

85
Q

Why can you do gradient descent for each output separately in single layer feedforward networks?

A

each weight leads from 1 input to 1 output unit, so weight change for weight connected to unit A will not affect unit B

86
Q

How do you define LMS error for Single layer gradient descent?

A

E = Σ Ep, where Ep = 0.5 * (outp – tp) ^ 2, i.e. E = 0.5 * Σp (outp – tp) ^ 2

87
Q

How do you calculate the corrective change for a weight to reduce the error on a weight-error surface?

A

Δwi = [– α * δE] / δwi

88
Q

How can you find error-weight gradients for weights wi leading to that output unit and then subsequently use these gradients to perform gradient descent?

A

δE / δwi = Σp (δEp / δoutp) * (δoutp / δexp) * (δexp / δwi)

= Σp (outp – tp) * (outp * (1-outp)) * inip

89
Q

How can you compute a suggested weight change for backprop?

A

Suggested change for ith weight, Δwi, = [-α * δE] / δwi

δE / δwi= Σp (outp – tp) * (outp * (1-outp)) * inip

90
Q

How do you find weights to output unit k in a single or multi layer?

A

δE / δwjk = Σp (outkp - tkp) * outkp(1 - outkp) * outjp

Note: for a single layer, outjp = injp

91
Q

How do you find weights to hidden unit j in the final or only hidden layer (single or multiple hidden layers)?

A

δE / δwij = Σk Σp (outkp - tkp) * outkp(1 - outkp) * wjk * outjp(1 - outjp) * outip

Note: if unit i is an input unit then outip = inip

92
Q

How do you find weights to hidden unit i in the penultimate hidden layer?

A

δE / δwui = Σk Σp (outkp - tkp) * outkp(1 - outkp) * wjk * outjp(1 - outjp) * wij * outip(1 - outip) * outup

Note: if unit u is an input unit then outup = inup

93
Q

How does far right out in error derivatives change for layers further back from the front of a neural network?

A

For layers further back, far R.H.S. outup is replaced with wui·outup·(1-outup) times output
from previous layer outtp and so on

94
Q

How does far right out term in error derivatives change when the 1st hidden layer of a neural network?

A

The far R.H.S. out will be the in from the input unit in this case

95
Q

What is Multi-layer Training?

A

???

96
Q

Why is the error-weight surface in the shape of a trough?

A

???

97
Q

True/false: error-weight surface is almost a quadratic bowl for non-linear sigmoid activation functions near a minimum – is a quadratic bowl for linear activation functions.

A

True

98
Q

What is summation? WHy does it lead to complex surface features?

A

???

99
Q

What is momentum and why is it used to aid gradient descent?

A

Analogous to physical momentum, keep weight changing in same direction until overcame by large change from large error

100
Q

How do you calculate the weight change with momentum?

A

Δwij (t) = – α δE(t) + β Δwij (t-1)

momentum coefficient, β is between 0 and 1

t is some measure of time. t comes immediately after t - 1.

101
Q

How does momentum help, especially on plateaus?

A

– Makes bigger transitions when gradients point consistently in one direction.
– Simulates the ball accelerating down a constant incline or down a hill.
– Reduces time for learning when gradients are shallow, e.g. on plateaus.

102
Q

How does momentum help on ravines?

A

– Adding a component that points in the previous transition direction damps
oscillations on ravines – as long as momentum coefficient < 1.
– Can speed up travel along the ravine bottom as it does on plateaus.

103
Q

How does momentum help with local minima?

A

– May possibly allow gradient descent to shoot over shallow local minima.
– But could also cause gradient descent to shoot over global minimum.
– A momentum coefficient that will allow learning to shoot over local minima
and not the global minimum may not exist.
– In any case, the optimal momentum setting is not known a priori.
– So momentum does not really overcome local minima other than by luck.

104
Q

How is Steepest gradient descent used?

A

Steepest gradient descent is used to guide the learning from random initial
weight states to weight states providing outputs closer to the given targets
given suitable neural topologies.

105
Q

Is back propagation supervised learning? Why?

A

Yes. There are explicit supervised target output values.

106
Q

What is the Ravine Problem?

A

???

107
Q

Why is the Ravine Problem prevalent?

A

???

108
Q

How does the Ravine Problem arise?

A

???