Principles of deep learning in artificial networks Flashcards

Question 1

Q

Deep learning approach (1)

Answer

A

Learn from experience (machine learning):

No formal rules of transformations
No ‘knowledge base’
No logical inference

Question 2

Q

Deep learning approach (2)

Answer

A

Process inputs through a hierarchy of concepts:

Each concept defined by its relationship to simpler concepts
So, build complicated concepts out of simpler concepts

Question 3

Q

Course goal (1)

Answer

A

Explore the relationship between cognitive science and AI

Question 4

Q

Course goal (2)

Answer

A

Focus on deep learning in artificial machine learning networks and comparison to biological systems

Which biological processes do deep networks imitate?
What is missing in artificial networks?
What might make AI/machine learning more like biological intelligence/learning

Question 5

Q

Course goal (3)

Answer

A

Become familiar with the use of AI in cognitive science research

Question 6

Q

Course goal (4)

Answer

A

Build some deep learning networks to do human-like tasks

Question 7

Q

Why deep learning?

Answer

A

AI has made great advances in tasks that are:
- Described by formal mathematical rules
- Relatively simpel for computers
- Difficult for humans
AI had been less effective in tasks that are:
- Hard/impossible to describe using formal mathematical rules
- BUT easy for humans to perform (intuitive or automatic)
Simulation of neural computation

Question 8

Q

Representation & features

Machine learning performance depends on the ….

Answer

A

representation of the case to be classified

what information the computer is given about the situation

Question 9

Q

Representation & features

Each piece of input information is knows as a …

Answer

A

feature
(the same feature can be represented in different formats, often easy to convert between formats. The chosen format strongly affects the difficult of the task

Question 10

Q

Representation in deep networks

Answer

A

Useful features may need to be transformed or extracted first.
So deep networks have multiple representations -> each is build from an earlier representation
This can: Transform features to a different format before learning their links to the output AND extract complex features from simpler features
Essentially multiple steps in a program
- Each layer can be seen as the computer’s memory state after executing a set of instructions.
- Deeper networks execute more instructions in sequence
Just like a computer program, the individual steps are generally very simple.
- Complex outcomes emerge from interactions between many simple steps

Question 11

Q

What is a deep network?

Answer

A

A learning network that transforms or extracts features using:

Multiple nonlinear processing units
Arranged in multiple layers with:
Hierarchical organisation
Different levels of representation and abstraction

Question 12

Q

20th century view of object recognition

Answer

A

Builds a representation of local image features
Builds a representation of larger-scale shapes and surfaces
Matches shapes and surfaces with stored object representations-recognition

Question 13

Q

Why nonlinear functions?

Answer

A

Any operation that can be done with only linear functions of the input can be straightforwardly described by formal mathematical rules, so is not a good use fore deep networks.

Question 14

Q

Name the complex nonlinear function with four operations or processing steps

Answer

A

Filter, threshold, pool and normalize

Question 15

Q

Name 1 issue which arises with ReLU

Answer

A

Is has no maximum output, while a biological neuron does have a maximum firing rate

Question 16

Q

What does the filter operation do especially?

Answer

A

The response of each unit depends on several neighbouring inputs. So the units after filtering respond to a certain area of the input image, and the activation of neighbouring units will often be similar. After several filter steps, each integrating inputs over an area, each unit will respond very similarly to an extensive area of the input. So neighbouring units are representing very similar information.

Question 17

Q

What does the pooling operation do?

Answer

A

Downsamples the units to improve computational efficiency. Discards some data in favour of computational efficiency.

Question 18

Q

The threshold and pool operations use …

Answer

A

max functions. That is why by the pool stage we have a mean activation above zero and an arbitrary range.

Question 19

Q

What does the normalisation operation do?

Answer

A

It linearly scales the data to have a mean of zero activation for each feature map’s responses to all images.

Question 20

Q

Why is normalisation important? Name 4 reasons.

Answer

A

Machine learning generally assumes that data reflects measurements of independent and identically-distributed (IID) variables. Normalisation forces identical distributions.
If the activation function depends whether the units response is above or below zero, having zero-mean inputs and zero-mean filters, about half of the units will be active and half inactive. This even split of activation is a very efficient way to store information in a network of limited size.
Having the same range for all feature maps and layers means the same maximum threshold in the activation function can be sued throughout the network..
As a result of these consideration and other technical considerations, training rates are far better after normalisation, and final classification accuracy.

Question 21

Q

Filter/convolve:

Answer

A

determine how well each group of nearby pixels matches each of a group of filters

Question 22

Q

Threshold/rectify:

Answer

A

introduces a nonlinearity by setting negative activations of units to zero (and maybe set a maximum activation)

Question 23

Q

Pool:

Answer

A

Downsample the units to improve computational efficiency

Question 24

Q

Normalise:

Answer

A

Rescale responses of each feature map to have mean zero and standard deviation one, so each feature map contributes similarly to classification

Question 25

Q

As we get higher up the network, these filters get harder to understand in two important ways. Name them.

Answer

A

The filter shape crosses multiple independent feature maps. An edge detector applied to an image is easy enough to conceptualise, but such a high-dimensional filter is harder to conceptualise.
The input feature maps become more abstract. It gets very hard to conceptualise what feature is represented.

Question 26

Q

Name the reasons why shared weights are used in artificial deep networks (and which do not apply in biological deep networks)?

Answer

A

Filters generally have a single set of weights for all positions in the feature map because:

If a feature is useful to compute at one position, it is probably also useful at another position.
The filter values are weights that need to be learned. It is very computationally demanding to do this if the set is too large.
The convolution operation is a very fast matrix function. If filters are not fixed, the convolution operation cannot be used.

Question 27

Q

The softmax operation …

Answer

A

The weights through our network will transform each input image into a ‘score’, reflecting the match between the top layer’s pattern of activation by previous examples of each category. This score must then be converted to a probability that this input image falls into each category. This is almost always done with the ‘softmax’ function.

Question 28

Q

Filter structures are targets for machine learning …

Answer

A

The convolution filters are the main links between different layers of our network, so they effectively form the weights of connections between the nodes in a neural network. Here, the nodes are pixels in a feature map, and the connections between these are filters. So, to learn the weight of connections, the network learns the structure of the filters.