60 min Blitz Flashcards

1
Q

create an empty matrix with dim 5,3

A

torch.empty(5,3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

create random matrix with dim 5,3

A

torch.random(5,3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

create zeros matrix of a type

A

torch.zeros(5,3,dtype=torch.long)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

construct a tensor directly from data

A

torch.tensor([5.5, 3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

create a tensor based on an existing tensor

A
x = x.new_ones(5, 3, dtype=torch.double)
x = torch.randn_like(x, dtype=torch.float)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

get size of matrix

A

x. size()

torch. Size([5,3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

operations

A

x+y

torch. add(x,y,out=result)
y. add_(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

resize/reshape tensor

A
y = x.view(16)
z = x.view(-1, 8) (the size -1 is inferred from other dimensions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

get the value of one element tensor

A

x.item()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

convert a torch tensor to a numpy array and vice versa

A

b = a.numpy()
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

move tensors in and out of GPU

A

if torch.cuda.is_available():
device = torch.device(“cuda”) # a CUDA device object
y = torch.ones_like(x, device=device) # directly create a tensor on GPU
x = x.to(device) # or just use strings .to("cuda")
z = x + y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

package that provides automatic differentiation for all operations on Tensors

A

autograd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

track all operations on a torch.Tensor

A

set its attribute .requires_grad as True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

compute all the gradients automatically

A

call .backward()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

attribute which has the gradient

A

.grad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

stop a tensor from tracking history

A

call .detach()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

when wrapping the code block in with torch.no_grad(): is hepful

A

when evaluating a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

attribute the references Function

19
Q

example conv network

A
input 32x32
C1 : feature maps 6@28x28
S2: 6@14x14
C3: 16@10x20
S4: 16@5x5
C5: 120
F6: 84
output 10
20
Q

typical traing procedure

A
  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
21
Q

does torch.nn support a single sample

A

no, it only supports a mini-batch of samples

22
Q

input of nn.Conv2d

A

4D Tensor of nSamples x nChanngels x Height x Width

23
Q

fake batch dimension

A

input.unsqueeze(0)

24
Q

calculate simple mean squared error between input and target

A
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)

25
which Tensors will have their .grad Tensor accumulated with the gradient
Tensors with requires_grad=True
26
print(loss.grad_fn) # MSELoss print(loss.grad_fn.next_functions[0][0]) # Linear print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
27
backpropagate the erros
loss.backward() conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([-0.0205, 0.0088, 0.0135, 0.0123, 0.0098, -0.0036])
28
Stochastic Gradient Descent (SGD)
weight = weight - learning_rate * gradient
29
implement SGD
learning_rate = 0.01 for f in net.parameters(): f.data.sub_(f.grad.data * learning_rate)
30
implement other different update rules such as SGD, Nesterov-SGD, Adam, RMSProp
torch.optim
31
create an optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
32
boilerplate code in training loop : - calculate loss - backpropagate loss - optimize loss
``` optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update ```
33
load data into np array
torchvision.datasets and torch.utils.data.DataLoader
34
get some random training images
dataiter = iter(trainloader) | images, labels = dataiter.next()
35
define a convolutional nn
``` class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) ``` ``` def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x ``` net = Net()
36
define loss function and optimizer
``` criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) ```
37
train the network
for epoch in range(2): # loop over the dataset multiple times ``` running_loss = 0.0 for i, data in enumerate(trainloader, 0): # get the inputs inputs, labels = data ``` ``` # zero the parameter gradients optimizer.zero_grad() ``` ``` # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() ``` # print statistics running_loss += loss.item() if i % 2000 == 1999: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0 print('Finished Training')
38
evaluate performance
``` correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data outputs = net(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() ``` print('Accuracy of the network on the 10000 test images: %d %%' % ( 100 * correct / total))
39
training on GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(device) net.to(device) inputs, labels = inputs.to(device), labels.to(device)
40
tutorial for data parallelism
https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
41
rnn embedding
https://github.com/hunkim/PyTorchZeroToAll/blob/master/12_4_hello_rnn_emb.py
42
seq2seq
https://github.com/hunkim/PyTorchZeroToAll/blob/master/14_1_seq2seq.py
43
seq2seq_att
https://github.com/hunkim/PyTorchZeroToAll/blob/master/14_2_seq2seq_att.py