60 min Blitz Flashcards

1
Q

create an empty matrix with dim 5,3

A

torch.empty(5,3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

create random matrix with dim 5,3

A

torch.random(5,3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

create zeros matrix of a type

A

torch.zeros(5,3,dtype=torch.long)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

construct a tensor directly from data

A

torch.tensor([5.5, 3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

create a tensor based on an existing tensor

A
x = x.new_ones(5, 3, dtype=torch.double)
x = torch.randn_like(x, dtype=torch.float)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

get size of matrix

A

x. size()

torch. Size([5,3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

operations

A

x+y

torch. add(x,y,out=result)
y. add_(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

resize/reshape tensor

A
y = x.view(16)
z = x.view(-1, 8) (the size -1 is inferred from other dimensions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

get the value of one element tensor

A

x.item()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

convert a torch tensor to a numpy array and vice versa

A

b = a.numpy()
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

move tensors in and out of GPU

A

if torch.cuda.is_available():
device = torch.device(“cuda”) # a CUDA device object
y = torch.ones_like(x, device=device) # directly create a tensor on GPU
x = x.to(device) # or just use strings .to("cuda")
z = x + y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

package that provides automatic differentiation for all operations on Tensors

A

autograd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

track all operations on a torch.Tensor

A

set its attribute .requires_grad as True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

compute all the gradients automatically

A

call .backward()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

attribute which has the gradient

A

.grad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

stop a tensor from tracking history

A

call .detach()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

when wrapping the code block in with torch.no_grad(): is hepful

A

when evaluating a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

attribute the references Function

A

.grad_fn

19
Q

example conv network

A
input 32x32
C1 : feature maps 6@28x28
S2: 6@14x14
C3: 16@10x20
S4: 16@5x5
C5: 120
F6: 84
output 10
20
Q

typical traing procedure

A
  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
21
Q

does torch.nn support a single sample

A

no, it only supports a mini-batch of samples

22
Q

input of nn.Conv2d

A

4D Tensor of nSamples x nChanngels x Height x Width

23
Q

fake batch dimension

A

input.unsqueeze(0)

24
Q

calculate simple mean squared error between input and target

A
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)

25
Q

which Tensors will have their .grad Tensor accumulated with the gradient

A

Tensors with requires_grad=True

26
Q

print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU

A
27
Q

backpropagate the erros

A

loss.backward()
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0205, 0.0088, 0.0135, 0.0123, 0.0098, -0.0036])

28
Q

Stochastic Gradient Descent (SGD)

A

weight = weight - learning_rate * gradient

29
Q

implement SGD

A

learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)

30
Q

implement other different update rules such as SGD, Nesterov-SGD, Adam, RMSProp

A

torch.optim

31
Q

create an optimizer

A

optimizer = optim.SGD(net.parameters(), lr=0.01)

32
Q

boilerplate code in training loop :

  • calculate loss
  • backpropagate loss
  • optimize loss
A
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update
33
Q

load data into np array

A

torchvision.datasets and torch.utils.data.DataLoader

34
Q

get some random training images

A

dataiter = iter(trainloader)

images, labels = dataiter.next()

35
Q

define a convolutional nn

A
class Net(nn.Module):
    def \_\_init\_\_(self):
        super(Net, self).\_\_init\_\_()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

36
Q

define loss function and optimizer

A
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
37
Q

train the network

A

for epoch in range(2): # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
        print('[%d, %5d] loss: %.3f' %
              (epoch + 1, i + 1, running_loss / 2000))
        running_loss = 0.0

print(‘Finished Training’)

38
Q

evaluate performance

A
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(‘Accuracy of the network on the 10000 test images: %d %%’ % (
100 * correct / total))

39
Q

training on GPU

A

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
print(device)
net.to(device)
inputs, labels = inputs.to(device), labels.to(device)

40
Q

tutorial for data parallelism

A

https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

41
Q

rnn embedding

A

https://github.com/hunkim/PyTorchZeroToAll/blob/master/12_4_hello_rnn_emb.py

42
Q

seq2seq

A

https://github.com/hunkim/PyTorchZeroToAll/blob/master/14_1_seq2seq.py

43
Q

seq2seq_att

A

https://github.com/hunkim/PyTorchZeroToAll/blob/master/14_2_seq2seq_att.py