Lecture 3 Flashcards

Question 1

Q

What is invariance in machine learning?

Answer

A

When some change is made to a data point, the target output is not changed. Therefore the network output should be the same.

Invariance in machine learning is a property of a model that ensures its output remains the same when the input is transformed in a way that doesn’t change the task the model was designed for.

Question 2

Q

In what contexts is it important to consider invariance?

Answer

A

Image recognition (classification - does it contain a red square, it doesn’t matter where the red square is in space)
Text recognition (scaling / enlarging shouldn’t change the answer)
Talking (invariance under speed - talking at a different rate shouldn’t change the meaning of the sentence)
Computational chemistry (asking if a drug can be used to treat disease, should get the same answer regardless of the position in space)

Question 3

Q

What types of invariance are there?

Answer

A

Translational
Rotational
Reflection
Scaling
Swapping (permutation)

Question 4

Q

What ways can we ensure that a model has the necessary invariances?

Answer

A

[Not clever but easy]
- Augment the training set with transformed examples
- Extract features that are also invariant
- Build the invariance into the network structure (this leads to CNNs)
[Clever but hard]

Question 5

Q

Give an example of augmenting the training set with transformed examples.

Answer

A

Rotating/tilting images

Question 6

Q

For invariance, if we apply some transformation T to the inputs xi, what happens to the output ti?

Answer

A

The output ti is unchanged.

Question 7

Q

How can the network learn this invariance?

Answer

A

If we augment the dataset. D –> D’

The original dataset contains the original input and output, in addition to the transformed input and output. In this way, we have doubled the size of the dataset. We now have a bigger dataset to train the model.

Question 8

Q

What are the advantages and disadvantages of augmenting the dataset?

Answer

A

Advantage
- Very straightforward to implement
- Can produce a score ???

Disadvantage
- Makes the training set much bigger
- It will only learn the invariance approximately (unless you have infinite data)

Question 9

Q

How can we enforce invariance exactly?

Answer

A

If the features input into the network are invariant to the transformation.

Harder, but more robust (it guarantees symmetry)

Question 10

Q

What is an example of invariant features that can be used?

Answer

A

We explored the example of energy of two atoms. Each has a position in space represented by three coordinates and energy as an output.

We could naively use the coordinates as features. However, these are not invariant to different operations. Energy is invariant to translation, rotation and permutation of atoms (ie swapping atom 1 for atom 2).

We find that we can only depend on the norm relative vector of two atoms. This also reduces the number of features from 6 to 1. A single feature now contains all of the information we need and is invariant to all of the required transformations.

Question 11

Q

What does symmetry refer to?

Answer

A

In machine learning, “symmetry” refers to the concept of a data pattern or transformation where an object remains unchanged under certain operations, like rotation, translation, or reflection, which allows models to learn more efficiently by leveraging these invariances in the data

Question 12

Q

Describe the difference between symmetry and invariance.

Answer

A

In machine learning, “symmetry” refers to a property of the data itself, where certain transformations can be applied without changing its underlying structure, while “invariance” describes a model’s ability to produce the same output regardless of those transformations applied to the input data; essentially, symmetry is a characteristic of the data, while invariance is a desired feature of a machine learning model that leverages that data symmetry.

Question 13

Q

What kind of neural network architecture can we build that gives invariant predictions?

Answer

A

Convolutional neural networks (CNNs) account for translational invariance in the inputs.

In a deep NN, the outputs from one layer can be thought of as the features input into the next layer - they have this modularity.

Question 14

Q

How are images processed in the brain?

Answer

A

Hierarchically

We start off by collecting low-level information on small scales and build our way up, passing information up the hierarchy. It is transformed into higher-level (and larger-scale) information.

This is similar to a deep neural network.

Question 15

Q

What is the problem when using a fully-connected deep neural network for processing images?

Answer

A

It would require a lot of weights.

Eg camera pictures can have 2M pixels, each with a corresponding RGB value, leading to 6M pieces of information. If we had 2000 nodes in the first hidden layer, there would be around 4B weights.

High-quality image recognition with a fully-connected deep neural network is unfeasible. We can help matters out by noting that these networks do not take into account spatial correlation.

Question 16

Q

Why should spatial correlation be considered in image processing?

Answer

A

Parts of an image that are closer together are more likely to be similar to each other than those that are far apart. We want to combine information from input features that are correlated with each other spatially.

Question 17

Q

What do convolutional neural networks (CNNs) do and how do they achieve this?

Answer

A

CNNs mimic the processing of images in the human brain.

They consist of feature learning and classification. Feature learning involves two new kinds of layers, convolutional and pooling layers. From this, we get a much smaller output set of convolutions which represent the image and these are fed into the standard fully connected layers for classification.

Question 18

Q

What are the two different layers of a CNN and what do they do?

Answer

A

Convolutional layers - search for visual elements in groups of spatially correlated pixels
Pooling layers - combine information from nearby pixels to create a smaller image

In this order usually

Question 19

Q

What does a convolutional unit / kernel do?

Answer

A

Looks at small groups of input features (pixels). It has a receptive field, which refers to the specific area of an input image that a single neuron in a convolutional layer of a neural network can “see” and use to calculate its output

They scan over the whole image, combining the inputs in their receptive fields

Question 20

Q

What is the output of a convolutional unit?

Answer

A

A smaller image

Question 21

Q

What is the size of a convolutional unit?

Answer

A

(n x m)

How many input units it is connected to

Question 22

Q

What is the stride s?

Answer

A

How often the convolutional unit observes information.

Question 23

Q

How many convolutional units in a layer can you have?

Answer

A

Many, each one scans for “something different” and produces a separate output image.

Question 24

Q

What letter is used to represent the convolution kernel?

Answer

A

W - weights of the convolutional kernel

Question 25

Q

How do you determine the output of a convolutional unit?

Answer

A

Determine the size of the output
Multiply and sum: “scan” the unit over the input image - multiply the weights and the sum. It is NOT matrix multiplication, it is element-wise. Multiplying the convolutional unit by a portion of the image matrix each time.
Add a bias: add/subtract the bias from each element. This is an extra weight.
Apply the activation function

Question 26

Q

What do convolutional units act as?

Answer

A

Feature detectors

Question 27

Q

How is the bias represented?

Question 28

Q

What are the weights of a convolutional unit tuned to detect horizontal lines (3x3)?

Answer

A

0 0 0
1 1 1
0 0 0

Question 29

Q

If a convolutional unit is tuned to detect horizontal lines, what are the expected outputs (following bias and activation function)?

Answer

A

If a horizontal line is present, there will be an output of 1 and the unit will “fire” due to this input. The output is a 1x1 image with element 1. The unit will only have an output when presented with a horizontal image.

If a horizontal line is not present, there will be an output of 0 and the unit will not “fire” due to the input.

Question 30

Q

When we apply a convolutional unit to a portion of the image, W * X1, what are we doing?

Answer

A

Convoluting the image with the CU

Question 31

Q

How can you use multiple convolutional units to detect more complex patterns?

Answer

A

Eg want to detect a cross

We can have two convolutional units, one for a diagonal line sloping down and one for a diagonal line sloping upwards. Where both fire, there is an intersection of the lines and a cross is present.

Question 32

Q

How do we decide which convolutional units to use / which features are important?

Answer

A

We as humans do not, the network learns what the important features are in the images presented to it - ie it will learn convolutional units based on features.

Question 33

Q

How are the features that are detected by given convolutional units determined?

Answer

A

By the weights, which are set during learning.

Question 34

Q

What is an advantage of CNNs over fully-connected networks?

Answer

A

The memory cost is smaller, as the number of weights is very small compared to fully-connected network.

Invariance - all parts are treated equally, so if part of the image is translated, it will still be recognised in the same way.

The convolutional units tell you what the ML model has identified as important - what kinds of features were detected and were important.

Question 35

Q

What was one of the first CNNs used for handwriting?

Question 36

Q

Convolutional units scan over the entire image - what does this mean for parts of the image?

Answer

A

All parts of the input image are treated equally.

If part of the image is translated, it will still be recognised the same way (invariance).

Question 37

Q

What can a single convolutional layer detect and what is the output?

Answer

A

A single convolutional layer detects simple features from the inputs. The outputs of convolutional units are smaller images.

Question 38

Q

What happens to the output of single convolutional layers?

Answer

A

These outputs can be fed into more convolutional layers, which put simple features together to make more complex ones.

Each convolutional unit scans over the image to produce a new image, each looking for a different (basic) feature. We can stack up these images to produce more complex ones.

Eg simple straight lines - build up to corners

Question 39

Q

Why can we pool information?

Answer

A

If the input data is correlated spatially, some of the information is redundant. Nearby points will be very similar to each other. We don’t need to keep all of this data and we can compress the image without losing a lot of information.

We combine (pool) sets of nearby pixels into one.

Question 40

Q

Describe the pooling unit.

Answer

A

The pooling unit has no weights - it combines pixels (units) in a predetermined way to make a smaller image. For example, using max pooling the pixel of a subset with maximum value of is used to represent the subset. In this way, a 2x2 square is combined into a single value. We could also use average pooling. NO WEIGHTS INVOLVED IN THESE OPERATIONS.

Question 41

Q

Describe the structure of a CNN.

Answer

A

CNNs are comprised of several layers, which may be fully-connected, convolutional or pooling.

Typically feature learning section - consisting of convolutional and pooling layers. In a way, this could be considered pre-processing. At the end of this, we have a stack of images out of convoluting and pooling. There is much smaller feature space (a much lower dimensions of features).

Followed by fully-connected layer used for classification. We get probabilities from this which determine the prediction.

Question 42

Q

What can be said of the dimensionality of CNNs?

Answer

A

High dimensional input –> much lower dimensional output

Question 43

Q

What are some examples of what CNNs are used for?

Answer

A

Mostly image recognition
Handwriting detection - classifying written letters/numbers
Facial recognition - user authentication
Object detection - eg detection and classification of obstacles by self-driving cars
Computational chemistry - drug discovery, atomistic simulation
Board games - AlphaGo (2014)

Question 44

Q

Describe AlexNet

Answer

A

One of the most modern CNNs, using GPUs in training. It popularised CNNs for image classification, had around 15% error.

The first layer detected edges, lines and gradients of colour. Layers resided on different GPUs (different GPUs looking for different features) - this emerged it wasn’t a constraint.

The second layer than detects more complex features (corners). The complexity of the features detected builds up with each layer.

Dog breeds.

Question 45

Q

What improved the error of AlexNet?

Answer

A

GoogLeNet - 22 layers and achieved around 6^ error, which is comparable to human performance. Utilised inception nodes.

Question 46

Q

What does the use of lots of deep layers mean?

Answer

A

Lots of weights which can lead to overfitting.

Question 47

Q

What is a CNN used for chemical systems?

Question 48

Q