ML Code Flashcards

1
Q

What usually makes up a large fraction of the code needed to perform machine learning?

A

Data handling.

Before we can train a model, we need to get the data into a form which is compatible with the training and testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we train and test in batches?

A

Training models on the full dataset all at once would take too long.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What sample is used at each iteration of training?

A

Different random sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main aim of building a model?

A

Make predictions on unseen data.

Classification or regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is generalisation?

A

The model’s ability to make predictions on new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we read in a dataset?

A

df = pd.read_csv(“file.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For image-based data, how do you create the training data?

A

train_data = torch.tensor(train_df.iloc[:, 1:].to_numpy(dtype = ‘float32’)/255).reshape(-1,1,28,28)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For image-based data, how do you create the training labels?

A

train_labels = torch.tensor(train_df[‘label’].values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is true of the training data and training labels shape?

A

The number of elements in the first training data result is the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we create the dataset in the form of tensors?

A

train_dataset = TensorDataset(train_data, train_labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the basic process for conducting data handling and training?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you create a training data loader?

A

train_loader = DataLoader(train_datset, batch_size=64, shuffle=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you create the testing data loader?

A

test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we generate a subsample?

A

Enumerate.

examples = enumerate(train_loader)

We can use the data loaders as iterators. So we can go through each batch of data and look at that at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does enumerate do?

A

Enumerate is a built-in function in python that allows you to keep track of the number of iterations (loops) in a loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can we do with the generated subsamples?

A

Look at one to invest age.

batch_idx, (example_data, example_targets) = next(examples)

Can look at shape and type of each. Keep asking for the next batch of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we plot images from the data?

A

Iterate through the data loader, to look at a batch of images and targets. Creates an array of plots to display the images and shows the target label.

Using ax.imshow() which is part of pyplot.

fig, axs = plt.subplots(nrows=2, ncols=4, figsize=(10, 5))
axs = axs.flatten()
for ax, image, label in zip(axs,example_data,example_targets):
ax.set_axis_off()
ax.imshow(image[0], cmap=plt.cm.gray_r, interpolation=”nearest”)
ax.set_title(“Training: %i” % label)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What do we do after we have investigated the initial tensor data?

A

Flatten the images, so that each are 1D tensors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is torch?

A

The library for PyTorch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does data type = torch.float32 mean?

A

They are torch tensors and the data are stored as floating point numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is it important that the test and training sets come from the same source?

A

If there are systematic differences, it would be hard to teach a model to deal with that. The model needs to have seen similar examples in order to learn how to interpret the images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the torch.nn library?

A

The neural network library.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For the basic, fully connected neural network, what should you do?

A

Flatten all of the images, to turn it into 1D data.

image_size = train_data.shape[1:]

To investigate the size:
input_layer_size =np.prod(image_size)

Each image (28x28 pixels) will then be represented by 784 numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do we define a model which is trained to convert image pixel values to labels?

A

[See flash card]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the Net class?

A

The neural network class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What do all neural network models inherit from?

A

The base class nn.Module

This has all the methods and attributes defined we need. It will do all of this under the hood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What do we need to do in the initialisation of the model?

A

Define the actual neural network.

self.classifier = nn.Sequential ( … )

The input layer (data) is connected to a hidden layer (array of numbers), connected by weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What do you do with nn.Sequential()?

A

Pass it a sequence of layers, it builds up a graph on the layers that you pass it.

It passes the data through one layer at time.

This is a mathematical, matrix multiplication architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you define a fully connected linear layer?

A

nn.Linear()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What do the outputs represent in classification?

A

How likely the image is to belong to the particular class.

The higher number = more likely to be a member of that class.

Larger = more confident in its answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When does a model acquire meaning?

A

When the weights have been trained - it learns to do the class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What can you say about the model when it is first initialised?

A

All the weights and biases are random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why do we need self when declaring the classifier?

A

If we don’t, it will be lost forever. The self. means we are storing the classifier inside the object whenever it is created, so we will always have access to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What else needs to be defined in the model?

A

A forward method.

This is what happens when you give the model some data. This is how it knows what to do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What arguments does forward() take?

A

Self and the data we will pass into it.

It does some commands on that data to pass it through the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

For the simple neural network, what does the forward method do?

A

[See flashcard]

If we pass images, we need to reshape them so that instead of 28x28 pixels, we flatten it.

Using the x.view command. It doesn’t change the memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Why do we need to use x.view()?

A

To reshape the data so that it is now flattened. Just have 1 dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What do we do with the flattened data?

A

Pass it to the classifier we previously defined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the final step of the forward method?

A

To turn these numbers into probabilities, we use the softmax function.

[See flashcard]

This operation turns the numbers into a meaning of log probability, so they all add up to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What do we do once the model is defined?

A

Create an instance of the model

model = Net()

This is now an object in memory, the model we will train.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Once the model is defined, what do we do?

A

We take a batch of training data using the enumerate command.

Pass the data into the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What does the enumerate command output?

A

The index, the data and the label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What do we do once we have passed data into the model?

A

Find the predictions for a given image.

Use torch.max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How do we utilise torch.max()?

A

Ask for the maximum between the output (given by the model) along the first dimension.

This is the dimension with 10 numbers. ie what is the maximum of the 10 numbers.

It gives two outputs - the maximum number and the position it occurred. We are interested in this.

We want the predicted class with the highest probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How can we visualise the data?

A

Confusion matrix

This tells us how many of the predictions are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the code for a confusion matrix?

A

disp = metrics.ConfusionMatrixDisplay.from_predictions(target, predicted)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the ideal confusion matrix?

A

All numbers on the diagonal are large, and all others are zero.

This would be a perfect model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What do we next need to do?

A

Train the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What do we set up for training the model?

A

A loss function.

Some way of modifying the function based on this score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Which loss do we do for classification?

A

Cross entropy loss / negative loss likelihood loss function

nn.NLLLoss()

Classification task - we have already converted the numbers to probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Which optimiser do we use?

A

The Adam optimiser.

optimal.Adam()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does the loss function do?

A

Tells us how good the output of the model is compared to what we want it to be.

We need to know how to reward the model for good predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Explain how the optimiser is used.

A

It takes the outcome of the criterion to optimise the model. We need something to use that information and change the model in some way.

It calculates all of the gradients in the back propagation step. Figures out how to change the weights of the models in order to make a better output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is optim.Adam?

A

An adaptive optimiser that is a good algorithm for a training model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What arguments does optim.Adam() take?

A

The model parameters (we have an instance which has a method parameters, which gives all parameters eg weights).

The learning rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

If the learning rate is big, what can be said of the changes to the model at each iteration?

A

If it is big, it will make big changes.

if it is small it will make small changes.

You want to balance this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

How do we train the model?

A

Iterate through the training data loader.

in each iteration we have a set of 64 images and 64 target labels.

We switch the model into training mode.

We tell the optimiser to zero all the gradients (incase it has accumulated gradients already). This is part of the back propagation step.

Then we pass our data through the model. We get 10 numbers for each image as the output.

Put the 10 numbers into the criterion function, with the labels. It produces a score for how good the model did on that set of training data.

We then do the backward propagation step on this score. It figures out how to change the weights to make the model better.

Then we optimise using step, updating the weights to try and make it better.

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What do you put at the start of a cell to determine how long it takes a cell to run?

A

%%time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What do we do once we have a trained model?

A

We want to test it.

Iterate through the test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

How do we carry out testing?

A

Turn the model to evaluation mode.

Disable gradient calculations.

Iterate over the data loader. Pass the data to the model.

Get the maximum of the numbers for each image, taking the second output.

Book keeping parts - keep track of how many are true and false in a particular batch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What does target.data.view_as(pred).sum() tell us?

A

How many of the predictions are equal to the targets in the test dataset.

Gives 1 if the prediction is equal to target and 0 if not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

After running the testing, what do we do?

A

Obtain a fraction of how many of the predictions were correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

In the final confusion matrix, what can we add?

A

normalize=’true’ in the brackets

We can see what fraction of each category it gets correct.

fig = plt.gcf()
fig.set_figheight(8)
fig.set_figwidth(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What different things can you do to play around with the model?

A
  • Add in different types of layers
  • Make the size of the hidden layers bigger or smaller
  • Change the optimiser
  • Change the learning rate
  • Change the architecture
  • Use different activation functions
  • Train for multiple-epochs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What activation functions can you use? [Lecture 1]

A

Linear activation function: weighted sum of all inputs and bias

Can add a non-linear activation function to increase the complexity of the model eg ReLU()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What is an epoch?

A

One complete pass of the entire training dataset through a learning algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What other layers can you add to the sequential function?

A

nn.BatchNorm1d(500)

nn.LeakyReLU(0.2, inplace=True)

nn.Dropout(0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How do you print the loss at each step?

A

Within for loop

print(loss.detach())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What are tensors?

A

Fundamental objects used to handle inputs, outputs and model parameters in PyTorch.

A tensor is a structure that assumes multilinear relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What do tensors behave similarly to?

A

Numpy arrays

They have a few extra features which help in machine learning applications.
- They can be faster for calculations (especially when using GPUs)
- They are optimised for automatic differentiation (required for back propagation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

How do you create a tensor from a list?

A

Initialise a tensor from a list - like arrays
torch.tensor(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

How do you turn a numpy array into a tensor?

A

torch.tensor(array)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

How do you create an array from a tensor?

A

np.asarray(tensor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

How do you create tensors using torch commands?

A

eg shape = (3,2)

rand_tensor = torch.rand(shape)

ones_tensor = torch.ones(shape)

zeros_tensor = torch.zeros(shape)

constant_tensor = torch.full(shape, 4.0)

identify_tensor = torch.eye(shape[0])
1 on diagonal and 0 everywhere else

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

How do you create a tensor from another tensor?

A

_like

x_ones = torch.ones_like(tensor)

Can add in
dtype = torch.float to override the datatype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What attributes are useful to ensuring tensors are compatible with each other and with the models in pytorch?

A

.shape
.dtype

.device tells you what device the tensor is stored on eg the cpu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

What data type is typically used in models?

A

torch.float32

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Describe the shape of this tensor torch.ones(2,2).reshape(1,2,2)

A

The .reshape(1,2,2) operation changes the shape while keeping the number of elements the same. The new shape is (1,2,2), meaning:

1: A new batch-like or singleton dimension.
2: The number of rows (same as before).
2: The number of columns (same as before).

The same data but wrapped in an extra dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What command do we use to reshape the data without changing its data?

A

.view

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

What does the code tensor_1.view(-1, 4) do?

A

.view() is used to reshape a tensor without changing its data.
-1 is a special value that tells PyTorch to automatically infer that dimension based on the number of elements.
4 means that each new row should have exactly 4 columns.

Tensor one is not actually modified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

What does the code tensor_1.data.view_as(tensor_2) do?

A

Reshapes tensor_1 to have the same shape as tensor_2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

What does tensor_2.eq(tensor_1.data.view_as(tensor_2)) do?

A

.eq() (short for equal) performs an element-wise comparison between tensor_2 and tensor_1.data.view_as(tensor_2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

How do we add a new dimension in pytorch?

A

.unsqueeze(dim) adds a new dimension at the specified position (dim).

eg .unsqueeze(0) adds a new dimension at the first position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What does tensor.squeeze() do?

A

.squeeze() removes dimensions with size 1 from a tensor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

How do we access elements from a tensor?

A

Standard numpy-like indexing and slicing.

First row - tensor[0]
First column - tensor[:, 0]
Last column - tensor[:, -1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

How can you join tensors?

A

Use torch.cat() to concatenate a sequence of tensors along a given dimension

eg t1 = torch.cat([tensor, tensor, tensor], dim=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

How do you carry out matrix multiplication of tensors?

A

tensor @ tensor.T

OR

tensor.matmul(tensor.T)

OR

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

How do you find the element-wise product of tensors?

A

tensor * tensor

OR

tensor.mul(tensor)

OR

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

Describe single-element tensors.

A

If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using item()

agg = tensor.sum()
agg_item = agg.item()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

How do find the minimum of a column?

A

values,indices = tensor_squared.min(dim=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

How do find the minimum of a row?

A

values,indices = tensor_squared.min(dim=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

What are in-place operations?

A

Operations that store the result into the operand.

They are denoted by a _ sufficient.

eg x.copy_(y), x.t_() will change x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

What happens in this case?

tensor1 = tensor

A

Defining a tensor as equal to another means they share the same memory location, so changes to one affect the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

How do you create a definitive copy of a tensor, rather than pointing to the same place in memory?

A

tensor2 = tensor.clone()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What is the most frequently used algorithm when training neural networks?

A

Back propagation.

In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

How do we compute the gradients of the loss function with respect to the given parameter (used to adjust parameters) in PyTorch?

A

torch.autograd

It supports automatic computation of gradient for any computational graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

What parameter is used to say that we want to know gradients with respect to weights and biases?

A

requires_grad=True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

In one line, what is the output of the neural network and expected outputs?

A

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Where z is a single layer of NN coded directly with tensors,
z = torch.matmul(x, w)+b

In this network, w and b are parameters, which we need to optimise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

How do we optimise the parameters, w and b of the network?

A

We need to be able to compute the gradients of loss functions with respect to those variables.

In order to do that, we set the requires_grad property of those tensors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

What property of a tensor stores a reference to the backward propagation function?

A

grad_fn

You can find the gradient function for z - z.grad_fn and the loss function loss.grad_fn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

How do we optimise the weights of parameters in the neural network?

A

We need to compute the derivatives of the loss function with respect to parameters, namely we need loss / dw and dloss / db under some fixed values of x and y.

To compute those derivatives, we call loss.backward() and then retrieve the values from w.grad and b.grad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

What happens if we keep calculating the backpropagation?

What do we therefore need to do?

A

The gradient value will grow.

The gradients can be reset to start again

b.grad = None
w.grad = None

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

What happens by default to tensors with requires_grad=True?

A

They are tracking their computational history and support gradient computation.

However, there are some cases when we do not need to do that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

When would we not need to track computational gradients with requires_grad=True?

A

When we have trained the model and just want to apply it to some input data.

We only want to do forward computations through the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

How can we stop tracking gradient computations?

A

Surrounding our computation code with torch.no_grad()

with torch.no_grad():
# code

Alternatively, we could use the detach method() on the tensor
z_det = z.detach()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Why might you want to disable gradient tracking?

A
  • To mark some parameters in the neural network as frozen parameters. This is a very common scenario for fine-tuning a pretrained network.
  • To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

What does PyTorch do with the gradients when we perform backward propagation?

A

It accumulates the gradients.

To compute the proper gradients, you need to zero out the grad property before. In real-life training, an optimiser helps us to do this.

108
Q

Why is ideal to have our dataset code decoupled from our model training code?

A

Readability and modularity.

109
Q

What data primitives does PyTorch provide to allow you to use pre-loaded datasets and data.

A

torch.utils.data.DataLoader

torch.utils.data.Dataset

110
Q

What does Dataset do?

A

Stores the samples and their corresponding labels.

111
Q

What does DataLoader do?

A

Wraps an utterable around the Dataset, to enable easy access to the samples.

112
Q

How do we visualise some samples in our training data?

A

We use matplotlib, indexing datasets manually like a list.

113
Q

If we load our data from pandas, we can extract the labels and pixels. How do we make this into a PyTorch dataset?

A

Using TensorDataset.

tensor_dataset = TensorDataset(test_images,test_labels)

114
Q

How do we plot an example of the tensor dataset?

A

img, label = tensor_dataset[0]
plt.imshow(img[0],cmap=’Greys_r’)
plt.title(label.item());

Where image and label match the data of what was passed into TensorDataset

115
Q

How does the Dataset retrieve our dataset’s features and labels?

A

Retrieves the datasets features and labels one sample at a time.

While training a model, we typically want to pass samples in “mini batches”, reshuffle the data at every epoch to reduce novel overfitting, and use Python’s multiprocessing to speed up data retrieval.

116
Q

While training a model, we typically want to pass samples in “mini batches”, reshuffle the data at every epoch to reduce novel overfitting, and use Python’s multiprocessing to speed up data retrieval.

What can we use to abstract this complexity into an easy API?

A

DataLoader

dataloader = DataLoader(tensor_dataset, batch_size=64, shuffle=True)

Now we can iterate through the dataset as needed.

117
Q

When iterating through a DataLoader, what does each iteration return?

A

A batch of train_features and train_labels (containing 64 features and labels respectively, as determined at creation of DataLoader).

118
Q

What does shuffle=True?

A

After we iterate over all batches, the data is shuffled.

119
Q

How do we iterate to the next DataLoader iteration?

A

train_features, train_labels = next(iter(dataloader))

120
Q

What is the first thing we need to do once we load data from a pandas data frame?

A

First convert it to tensors

Convert pixel values in data frame to a numpy array, then pass this to torch.tensor.

121
Q

How do we get the number from a tensor, rather than having the tensor itself?

A

tensor get .item

122
Q

How do we build a dataset out of the tensor data?

A

TensorDataset()

123
Q

What does nn eg in nn.Module represent?

A

The neural network library

124
Q

What are neural networks comprised of?

A

Layers/modules that perform operations on data.

A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

125
Q

How do we define a neural network with Linear and ReLU activations?

A

class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)

def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits
126
Q

How do we create an instance of the defined model?

A

model = NeuralNetwork()

127
Q

How do we print the structure of the defined model?

A

print(model)

128
Q

How do we use the model?

A

Pass it the input data. This executes the model’s forward, along with some background operations.

129
Q

What dimension of output does calling the model on the input return?

A

eg 10 classes

The input returns a 10-dimensional tensor with raw predicted values for each class.

130
Q

How can we get the prediction probabilities?

A

Passing it through an instance of the nn.Softmax module

131
Q

When is the flatten function particularly useful?

A

For image data

132
Q

What do we pass to nn.Sequential()?

A

Passing a sequence of layers and functions you want the nn to perform one after the other.

133
Q

For image data, what is the input value to the sequence of layers in nn.Sequential()?

A

The number of inputs is the number of pixels eg 28*28

134
Q

For image data, what is the output value of the sequence of layers in nn.Sequential()?

A

The number of classes possible in the prediction.

135
Q

What does adding non-linear activation functions such as nn.ReLU() do?

A

Increases the complexity and the ability of the model.

They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

136
Q

When do models acquire meaning?

A

only when we do the training.

137
Q

How do we train the model?

A

Create random data to pass through the model

138
Q

How do we get the prediction probabilities from the output of the model?

A

Pass it through the nn.Softmax function

logits = model(X)
pred_probs = nn.Softmax(dim=1)(logits)

139
Q

Of the (eg 10) probabilities output of nn.Softmax, how do we determine what the model predicted?

A

Using .argmax(1)

y_pred = pred_probs.argmax(1)

Predicted class - y_pred.item()

140
Q

What does the loss function show?

A

How well the model is performing.

141
Q

What does the nn.Flatten layer do?

A

Converts each 2D 28x28 image into a continuous array of 784 pixel values (
the minibatch dimension (at dim=0) is maintained).

142
Q

What does the last linear layer of the neural network return?

A

Logits - these are raw values in - infinity to infinity

143
Q

Once logins are returned from the last linear layer of the neural network, where do we put them?

A

We pass them to the nn.Softmax module.

144
Q

What does the nn.Softmax module do to logins?

A

Scales the values between 0 and 1, representing the model’s predicted probabilities for each class.

145
Q

In softmax = nn.Softmax(dim=1), what does dim=1 mean?

A

The dimension along which the values must sum to is 1.

146
Q

What do we need to pass to the NLLLoss function?

A

We need to provide log probabilities. This can be done more efficiently with the LogSoftmax module

147
Q

What is an alternative to using the NLLLoss function and the LogSoftmax module?

A

A better approach is to take the model output logins and use the CrossEntropyLoss()

loss_fcn = nn.CrossEntropyLoss()
loss_fcn(logits,labels)

148
Q

What does parameterised mean?

A

There are associated weights and biases that are optimised during training.

149
Q

What does subclassing nn.Module allow for parameters?

A

It automatically tracks all fields defined inside the model object, and makes all parameters accessible using the model’s parameters() or named_parameters() methods.

150
Q

How do you save a model?

A

A common way is to serialise the internal state dictionary (containing the model).

torch.save(model.state_dict(), “model.pth”)

151
Q

How do you load a model?

A

It includes re-creating the model structure and loading the state dictionary into it.

model = NeuralNetwork()
model.load_state_dict(torch.load(“model.pth”,weights_only=False))

152
Q

Why are images less ideal for fully connected layers?

A

The images were turned into 1D tensors which caused the loss of the relative position of pixels.

We may expect neighbouring pixels to be related to each other more than ones that are far apart.

153
Q

Describe a 2D convolutional neural layer.

A

A set of kernels are convolved with the input to produce the output.

154
Q

What do convolutional neural networks allow for images?

A

They are natural processes for revealing interesting features in images.

This means that they only extract features that are likely to be important for a machine learning applciation.

155
Q

In a convolutional layer, what does the learning?

A

The kernels themselves learn during the training process.

ie the values of each element of the kernels are individual parameters to be trained.

156
Q

What parameters does the convolutional layer have?

A
  • Padding
  • Stride

These affect how the kernel is used and the shape of the output.

157
Q

What are other important layers for convolutional neural networks, in addition to the convolutional layers?

A
  • Max pooling
  • Flatten
158
Q

What does max pooling achieve?

A

Max pooling effectively shrinks down an image by taking the maximimum value over a specified range of pixels.

Alternative - average pooling.

159
Q

What does flatten achieve?

A

Flatten reduces the dimensionality, in a similar way to reshaping.

This is required to connect a convolutional layer to a fully connected 1D layer, such as the output layer of the network.

160
Q

How do you define a convolutional network?

A

class CNN(nn.Module):

161
Q

When we have the first convolutional layer, how many input channels do we need for images?

A

1 if the dealing with monochrome images

3 if dealing with RGB images

162
Q

What is the default for Conv2d in terms of padding and stride?

A

No padding
Stride = 1

163
Q

What two additional layers do convolutional networks include?

A

nn.Conv2d - this is used for the first layer, we have to respect the dimensions of the data being input. This layer requires we have a channel - for monochrome images we have 1 channel.

nn.MaxPool2d

164
Q

How do you define the loss function for a classification problem?

A

loss_fn = nn.CrossEntropyLoss()

loss_fn(outputs,labels)

165
Q

What does the nn.CrossEntropyLoss() do?

A

It takes logins as the input and performs log_softmax and then NLLLoss()

166
Q

What does “functional version” of softmax refer to?

A

It is a function that directly computes the softmax. The object doesn’t need to be instantiated and it can be used in line.

167
Q

How do we import the functional version of softmax?

A

import torch.nn.functional as F

168
Q

What do we pass into the training function?

A

def train(model,loss_fn,optimizer):

  • Passing in the model, data loader and optimiser
169
Q

What is a difference between enumerate and iterate?

A

Enumerate has a built in way to keep track of which batch we are on, we would have to do this manually for iterate.

170
Q

Each time we iterate through a training batch, what do we do?

A
  • Compute the prediction error (get the predictions and pass to loss function)
  • Zero the gradients, carry out backwards propagation, use the optimiser to update the weights
  • We can print out how it is going eg plot the loss
171
Q

What is a key difference between the training and test functions?

A

In the test function, we do not carry out back propagation
- it is set to evaluation mode
- we turn off the gradient calculation using with torch.no_grad():

The function only takes in the model and loss function (no optimiser)

def test(model, loss_fn):

172
Q

How can we see how many parameters are used in a model?

A

n_param = 0
for parameter in model.parameters():
n_param+=np.prod(parameter.shape)
n_param

173
Q

How do the number of parameters in a CNN compare to a fully connected model?

A

Convolutional layers have fewer parameters than a fully connected neural network. This is due to weight sharing.

We can get far better performance for images (or natural data) where there is correlation of particular elements of data.

174
Q

When are we more likely to experience overfitting?

A

When training complex models (large number of free parameters) with a limited data set.

175
Q

What does overfitting mean?

A

The model will be able to fit the training data very well (potentially perfectly), but in the process learns a model that is very specific to the training data.

When we are asked to make predictions on unseen data (eg testing dataset) the accuracy will be poor as the model is not generalisable.

At the extreme end, the model may just be effectively “memorising” the training data, and may not have any predictive power at all.

176
Q

How can we detect overfitting?

A

Comparing the training data loss with the testing data loss.

If the training data loss continues to improve with every training loop, but the testing loss does not improve (or gets worse) then overfitting may be responsible.

177
Q

What are some strategies to prevent overfitting?

A
  • Use a larger training set
  • Use a smaller network
  • Weight sharing (as in CNNs)
  • Using dropout layers
  • Data normalisation
  • Data augmentation
  • Early stopping
  • Transfer learning
  • Model averaging
  • Weight decay
  • Batch normalisation
178
Q

Why is data normalisation a strategy for preventing overfitting?

A

Neural networks generally work best with values that follow a Normal distribution ie have an average of zero and a standard deviation of one.

179
Q

What is data augmentation?

A

Applying a transformation to the training data to improve its variability - artificially increasing the training dataset size

180
Q

What does early stopping achieve?

A

Stop the model when the test loss stagnates rather than continuing any further.

181
Q

What is transfer learning?

A

Using a retrained model for part of your neural network and then just re-training the last few layers with your data.

182
Q

How do we achieve batch normalisation?

A

Adding BatchNorm1d or BatchNorm2d layers to ensure inputs are close to a normal distribution

183
Q

Discuss how effective it is to collect a larger training set.

A

It may be impractical or expensive in practice.

184
Q

Discuss how effective it is to use a smaller network.

A

It means that we need to restart training, rather than use what we already know about hyper parameters and appropriate weights.

185
Q

Discuss how effective it is to carry out transfer learning.

A

We use pre-trained weights of a different model as part of the neural network. The architectures and weights of this were trained using a larger dataset and trained to solve a different image classification problem.

Nevertheless, transfer learning allows us to leverage information form larger data sets with low computational cost.

You would typically just train the last few layers with the training dataset while keeping the rest of the weights fixed.

186
Q

How do we refer to a dataset used to check for overfitting?

A

It is no longer a pure test dataset, instead it is called a validation dataset.

187
Q

What is a difference between convolutional neural networks and the fully connected layers we saw previously?

A

CNN - doesn’t require data reshaping
- Eg flatten layers
- When presented with image data, just pass them into CNN
- Real life – you may need to process more, but not for the images we will see

188
Q

How do we define the training function for CNNs?

A

def train(model, train_loader, test_loader, batch_size=20, num_epochs=1, learn_rate=0.001, weight_decay=0):

189
Q

What does nn.Conv2d() take as arguments?

A

Three compulsory units
- Number of channels in: greyscale images = 1
- Number of channels out: this is how many kernels we will load
- Kernel size eg kernel size of 3x3 means 3

190
Q

What does nn.Flatten() achieve?

A

This architectural part is to make sure that we end up with 10 layers

191
Q

In which part of the code do we account for overfitting?

A

The training block

192
Q

How do we plot the learning curves?

A

In the train function, we can plot at the end of the training process.

Need to keep track of epochs, losses, val_losses and val_acc throughout

193
Q

What other function can we create?

A

def get_accuracy(model, test_loader, criterion):

It must be in evaluation mode.

194
Q

What is the code for the get_accuracy function?

A

def get_accuracy(model, test_loader, criterion):
correct = 0
total = 0
loss = 0
model.eval() #*******#
for imgs, labels in test_loader:
output = model(imgs)
loss += criterion(output, labels).item()
pred = output.max(1, keepdim=True)[1] # get the index of the max logit
correct += pred.eq(labels.view_as(pred)).sum().item()
total += imgs.shape[0]
return loss/total, correct / total

195
Q

What plots can you make?

A
  • Plotting loss over epoch number
  • Plotting accuracy over epoch number
196
Q

What is data normalisation?

A

Scaling the input features of a neural network, so that all features are called similarly (means and standard deviations).

This makes the training problem easier.

eg scaling so that there is a mean of 0 and standard deviation of 1
eg scaling so that things are in the range [0, 1]

197
Q

How do we normalise the data?

A

train_mean = train_data.mean()
train_std = train_data.std()
norm = transforms.Normalize(train_mean, train_std)

train_data_norm = norm(train_data)test_data_norm = norm(test_data)

This transform subtracts the mean value from each pixel, and divides the result by the standard deviation.

We then pass this data to TensorDataset, create DataLoaders and pass these parameters to an instantiation of the model.

198
Q

Why is data augmentation useful?

A

While it is often expensive to gather more data, we can often programmatically “generate” more data points from our existing data set.

199
Q

What are common ways of obtaining new (image) data?

A
  • Flipping each image horizontally or vertically (won’t work for digit recognition but may for other tasks)
  • Shifting each pixel a little to the left or rght
  • Rotating the images
  • Scaling images up or down
  • Adding noise tot he image
  • Can have a combination of these approaches
200
Q

Programatically, how could we apply rotations/translations/scaling to the training images?

A

transform=transforms.RandomAffine(XXX)
train_data_trans = transform(train_data)

Then create another tensor dataset.
train_dataset = TensorDataset(train_data_trans,train_labels)

eg rotate by up to 25 degrees, translations of up to 5% of the image size, scaling from 80 to 110the original size
transform=transforms.RandomAffine(25, translate=(0.05,0.05),scale=(0.8,1.1),)

201
Q

Discuss weight decay.

A

Weight decay is a technique that prevents overfitting. It penalises large weights.

We want to avoid large weights, because large weights mean that the prediction relies a lot on the content of one pixel, or on one unit. Intuitively, it does not make sense that classification should rely heavily on one, or a few pixels.

Mathematically, we penalise large weights by adding an extra term to the loss function.

202
Q

How is weight decay achieved in PyTorch?

A

Weight decay can be done automatically inside an optimiser. The parameter weight_decay of optimal.ADAM and most other optimisers uses L^2 regularisation.

The value of the weight_decay parameter is another tuneable hyper parameter.

train(model, train_loader_aug, test_loader, num_epochs=50,weight_decay=1e-3)

203
Q

Discuss the dropout method.

A

Another way to prevent overfitting is to build many models, then average their predictions at test time. Each model might have a different set of initial weights.

Dropout randomly zeros out a portion of neurons from each training iteration. This has an effect of preventing weights from being overly dependent on each other. Weights are encourage to be “more independent” of one another.

We only drop out neurons during training, at test time we use the entire set of weights. This means that our training and test behaviour of dropout layers are different.

204
Q

How do we incorporate dropout into our model?

A

We add a nn.Dropout2d(X) layer to our nn.Sequential

eg
nn.Conv2d(1, 16, 3),
nn.MaxPool2d(2),
nn.Dropout2d(0.5),
nn.ReLU()

205
Q

How do we incorporate normalisation into our model?

A

We add a nn.BatchNorm2d(X) layer to our nn.Sequential

eg
nn.Conv2d(1, 16, 3),
nn.BatchNorm2d(16),
nn.MaxPool2d(2),
nn.Dropout2d(0.5),
nn.ReLU()

206
Q

What is the task of sentiment analysis?

A

To identify the sentiment of a particular bit of text.

Eg deciding if an app review is positive or negative from the written words.

The machine learning model has to learn something about language, and the meaning of particular sentences.

207
Q

Discuss the differences between text data and numerical data.

A
  • Text is made of characters and strings, whereas neural network deal with numbers and matrix operations
  • Text can be different lengths, whereas the data before were all composed of equally sized 1D or 2D numbers
208
Q

How do we deal with text data?

A

We need to convert the text into numbers.

This is typically done in two stages.
- The text is broken up into either individual characters or individual words and symbols
- Each possible character or word is assigned a number (basically a lookup table or substitution code)
In this way, a sequence of characters or words can be converted to numbers where each number represents a particular possibility.

209
Q

What piece of code do we need to turn text into separate words and symbols?

A

tokenizer = get_tokenizer(‘basic_english’)

210
Q

What does this piece of code do - tokenizer(‘I ate my sandwich at my desk!’)?

A

Separates out the sentence into an array of words and punctuation.

It also makes everything lowercase.

211
Q

What piece of code do we use to ensure randomisation doesn’t change the outputs?

A

torch.manual_seed(99)
np.random.seed(99)

212
Q

How do you split text dataframe data into a training and test set?

A

train_test_split(tweet_df, test_size=0.1)

Splits tweet_df into 90% training and 10% testing.

_ (underscore) is used to ignore the first returned value.

tweet_sub_df stores 10% of the dataset.

train_df,test_df = train_test_split(tweet_sub_df,test_size=0.1)

Double split
- The first split selects a random subset (10%) of the full dataset.
- The second split divides this subset into train (90%) and test (10%).

213
Q

How do we build a vocabulary up?

A

By passing all of the tokens from these tweets to build_vocab_from_iterator

from torchtext.vocab import build_vocab_from_iterator

def yield_tokens(df):
for n, row in df.iterrows():
yield tokenizer(row[1])

vocab = build_vocab_from_iterator(yield_tokens(train_df), specials=[“<unk>"])
vocab.set_default_index(vocab["<unk>"])</unk></unk>

214
Q

What do the references to <unk> tell the vocabulary builder?</unk>

A

To put all unknown tokens as a value of zero.

ie tokens not in the vocabulary will be given a value of zero by the vocab object

215
Q

What can we do with the vocab object once created?

A

vocab(tokenizer(‘I ate my sandwich at my desk!’))

returns the corresponding numbers of the tokens

216
Q

How do we investigate the corresponding number of a token and vice versa?

A

vocab.get_stoi()[‘yes’]

vocab.get_itos()[0]

217
Q

What kind of vector do we build once we have got the corresponding numbers of our tokens?

A

The standard approach is then to build a vector for each token or piece of text where the length of the vector is equal to the number of possible tokens.

For a given token, all elements of the vector are zero except at the index which corresponds to the position of that token in the vocabulary.

def make_vectors(text):
indexes = vocab(tokenizer(text))
vectors = torch.zeros(len(vocab),len(indexes))
for n,ind in enumerate(indexes):
vectors[ind,n]=1
return vectors

text_vectors = make_vectors(text)

Can then investigate .shape and .argmax(0)

218
Q

How can we combine vectors of individual tokens?

A

If we sum them, we get a single vector for each piece of text that counts how many times each token appears.

This text data can be passed to a neural network model.

219
Q

What is the size of the input layer of a model for text data?

A

It would need to be large - equal to the total number of possible tokens.

219
Q

What would the input of the neural network be for text data?

A

It would be a long vector - which is mostly zeros but has the count of each possible token in a given piece of text.

220
Q

How do we create a function and class to turn a text dataset into numerical vectors?

A
  • Define a function text_2_vec
  • Define a class CustomTextDataset

def text_2_vec(text):
return make_vectors(text).sum(1)

class CustomTextDataset(Dataset):
def __init__(self, labels, text):
self.labels = labels
self.text = text

def \_\_len\_\_(self):
    return len(self.labels)

def \_\_getitem\_\_(self, idx):
    label = self.labels[idx]
    text = self.text[idx]
    vec = text_2_vec(text)
    return label, vec
221
Q

How do we prepare text data for machine learning models by converting it into a numerical format, using text_2_vec(t)

A

train_vectors = torch.Tensor(len(train_texts),len(vocab))
for n,t in enumerate(train_texts):
train_vectors[n,:] = text_2_vec(t)

This code creates an empty tensor for vectorised text. The text is converted into numerical vectors and stores them in train_vectors.

222
Q

How do you create a training and testing dataset based on the numerical text values?

A

train_dataset = CustomTextDataset(train_labels,train_texts)
test_dataset = CustomTextDataset(test_labels,test_texts)
test_dataset[0][1].shape

Using the previously defined CustomTextDataset class

223
Q

What is the first layer of the neural network model, that eakesthe text vector as in input referred to as?

A

Embedding

There are better and quicker ways of doing the embedding eg nn.Embedding

224
Q

What is a common way to do the embedding for language models?

A

Use a pre-trained embedding layer.

GloVe - global vectors for word representation

GloVe embedding is an example of unsupervised learning, where the vector representation for words is learnt from a body of text by looking at the co-location of different words. The GloVe model learns which words are closely related and which are not. This enables the algorithm to place words in multidimensional representation, so that similar words are close together and different words are far part.

225
Q

What kind of GloVe layer will we use?

A

A layer pre trained on millions of documents and has learned an efficient embedding for English text.

226
Q

How do we obtain a retrained glove model?

A

glove = torch.load(‘glove6B_20000.pth’)
n_dim = glove.dim

The dimension specifies how many dimensions are used to encode the tokens and max_vectors is how many tokens to include.

227
Q

How do we get a 50-long vector representation of the token “yes”?

A

glove.get_vecs_by_tokens(‘yes’)

228
Q

What is returned if we pass a word not in the pre-trained vocabulary?

A

It will return the zero-vector

229
Q

The glove object contains a list of all the trained words.

How do we see if the glove object contains a word?

A

‘hello’ in glove.stoi

230
Q

Why is the choice of tokeniser important?

A

Eg in some tokenisers, capitalisation is lost, which may be important

231
Q

How does our neural network model for Glove differ?

A

The number of inputs is n_dim which is glove.dim

There are two outputs - we were looking at if tweets were positive or negative.

232
Q

How does the model using glove embedding compare?

A

It is quicker to run, however the training loss and test loss and accuracy performance is a bit worse.

This could be improved by using a larger glove model. eg we only use 20 000words and 50 embedded dimensions.

Additionally, the model using pre-trained GloVe embedding has far fewer trainable weight than the previous model.

233
Q

There is a pytorch layer that can handle the embedding for you. It uses the glove model to turn a tensor of word indexes into embedded vectors, what is the code for this?

A

train_texts = train_df[1].values
print(train_texts[3])
emb = nn.Embedding.from_pretrained(glove.vectors)
inds = torch.tensor([vocab[t] for t in tokenizer(train_texts[3])]).reshape(1,-1,)
print(inds)
emb(inds).shape

234
Q

What is the issue with reducing each tweet to a single vector?

A

It reduces the amount of information that was available to train the network.

It would be better to pass each word in a sentence and to maintain the order of the words so that the model may learn to interpret the meaning of the texts.

Recurrent neural networks (RNNs) are well suited to this task due to their ability to maintain a state which remembers the context of new words.

235
Q

Why would we restrict the number of train articles?

A

To reduce train time

236
Q

One issue with text sentiment analysis is that you may have pieces of text that are different lengths. How do we deal with this?

A

There is no consistent size that we could choose for the tensors.

The simplistic ways involve padding and truncating tensors so that they are all the same length.

237
Q

What number do we pad the ends with

A

-1

The padding makes sure we can put the indexes from many texts into a single tensor.

-1 was chosen as it is easy to identify which indices are from the padding later (none of the natural words are assigned -1)

238
Q

What do we do following padding?

A

Truncate down to the desired length

239
Q

Why might you need to offset the labels by 1?

A

eg

np.unique(train_df[0].values) shows that there are numbers 1-4, we may want to subtract 1 from these to 0-index. This allows indexes to work in the classification tasks.

240
Q

Why can we not pass padded data straight to the embedder?

A

It does not like the -1s, we need to use .clip

data.clip(min=0)[0]

This means anything below 0 gets clipped to 0.

241
Q

For text data, how do we pass data to the embedder?

A

embed_batch = nn.Embedding.from_pretrained(glove.vectors)
embed_batch(data.clip(min=0)).shape

242
Q

What do RNN modules contain?

A

Hidden layers that modify and are modified by the update function as each element in the string is passed.

243
Q

Describe the code for the run layer.

A

rnn_layer = nn.RNN(input_size=n_dim, hidden_size=50, batch_first=True)

244
Q

What is the input format of the RNN layer and when?

A

[batch_size, seq_len, repr_dim]

When batch_first is true

245
Q

If you don’t specify the initial hidden state, h0, what is assumed?

A

It is 0s.

h0 = torch.zeros(1,text_emb.shape[0],hidden_size)

246
Q

What are the outputs of RNN?

A

Out – the full history of the hidden state

Last_hidden is the hidden states after all elements of the sequence have been passed through it – can take this and pass it to the fully connect to do classification task

out, last_hidden = rnn_layer(text_emb)

could also pass in h0

247
Q

What does the output variable contain?

A

The concatenation of all of the output units for each word (ie at each time point).

248
Q

Which part of the output variable are we concerned about?

A

We only care about the output at the final time point. We can extract like:

out[0,-1,:]

For the idea layer, it is in a different order, so ask for the 0th dimension instead

last_hidden[:,0,:]

249
Q

What kind of data do we want to look at?

A

Meaningful data, data that went in as above 0.

(torch.arange(data.shape[1])*(data[0]>=0) ).argmax()

Argmax gives us the index of the final output ie the bit we want.

250
Q

How do we define a model for text data using RNN?

A
  • self.emb = nn.Embedding
  • self.rnn = nn.RNN
  • self.fc = nn.Sequential

class TextRNN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()

    self.emb = nn.Embedding.from_pretrained(glove.vectors)
    self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
    self.fc  = nn.Sequential(
                            nn.Linear(hidden_size, 50),
                            nn.Dropout(0.2),
                            nn.Linear(50, num_classes)
                            )
251
Q

What do we include in the forward function defined in the model?

A
  • Finding the index of the last non-zero input
  • Apply the embedding
  • Forward propagate through the run
  • Get the last valid output
  • Propagate through the fc layers to the output
252
Q

What are more powerful versions of RNN?

A
  • Long short-term memory (LSTM)
  • Gated-recurrent unit (GRU)

They both aim to overcome the vanishing gradients problem

253
Q

What is the layer and implementation for LSTM?

A

lstm_layer = nn.LSTM(input_size=n_dim, hidden_size=50, batch_first=True)

254
Q

What is a difference with LSTM?

A

LSTM keeps track of both a hidden state and a cell state, so it has an extra set of weights to initialise.

h0 = torch.zeros(1, text_emb.shape[0], 50)

c0 = torch.zeros(1, text_emb.shape[0], 50)

out, last_hidden = lstm_layer(text_emb, (h0, c0))

255
Q

What are autoencoders?

A

Adapted forms of neural networks which are generally tasked with reproducing the input

  • The input and output are the same for a perfect autoencoder
256
Q

What is a characteristic of the autoencoder’s architecture?

A

There is a hidden layer with fewer dimensions than the input, creating an information bottle neck.

257
Q

What are the two parts of the autoencoder?

A

The encoder and the decoder, with the code section in between (latent representation).

258
Q

Why can the autoencoder be considered a lossy compression algorithm?

A

You could use the trained encoder to reduce some data to a smaller representation, and send this somewhere else where the trained decoder is used to retrieve the original data.

It is lossy because in practice the reconstruction is not perfect.

259
Q

What is the difference for programming an autoencoder in pytorch compared to other models we have seen?

A

It is similar but we need to create an information bottleneck and train it using the same data as the input and output.

260
Q

What error function do we use for the autoencoder?

A

The loss function is required to measure the fidelity of the reconstruction and so we will use the mean squared error of the input and output tensors.

261
Q

What does the ConvAutoencoder class contain?

A

A self.encoder and self.decoder part.

self.encoder - a network to connect the batch of images to the later space. The final layer outputs latent_dim which will be the size of the bottle neck.

self.decoder - is a network to connect the latent space to the image reconstructions.

262
Q

What is different about the loss calculation for the autoencoder?

A

Now we don’t have labels (as we did in classification) - we compare the reconstructions.

For this, we use the MSE loss function.

nn.MSELoss()

263
Q

What kinds of layers are good to use for image data?

A

Convolutional layers

264
Q

Name any differences in the encoder and decoder networks?

A

self.encoder has an nn.Flatten() layer

self.decoder has an nn.Unflatten() layer - this has inputs - use a square number in the linear layer to have a rectangular input here

265
Q

What is nn.UpsamplingBilinear2d?

A

An optional layer - kind of like an inverse of max pool which was used to reduce the data

266
Q

How does the training differ for an autoencoder?