Lesson 1-3 Flashcards
Is ML a black box?
No. Interpretable ML Visualize gradients and activatuions
Does deep learning need to much data?
No. Transfer learning; share/use pre-trained nets
What does a Union[…] mean in a function signature?
One of the items
Why do you need to make all the images the same shape and size?
Because in order for the GPU to work fast with them, it has to be that way!
What size usually works?
square with size=224 :-)
What are the resnets?
Pre-trained, 34 (#layers) and 50. Different sizes. Start with the smaller one. Trained on 1.5mm imagenet pictures. Has pre-trained weigths. Start with a model that knows how to recognize 1000 categories.
What is transfer learning?
Take a model that knows how to do something well, and make it learn how to do your thing REALLY well. Train with 1/100th or less (maybe thousands of times) of data and time.
What is one cycle learning?
Better, faster; recent paper (TODO: look this up)
What does unfreeze do?
Without unfreeze, the fitting is done on the final layers only, leaving the initial pre-trained layers untouched. This makes it very fast and avoids overfitting. Unfreeze means fit all layers,.
Paper on Understanding CNN
Visualizing and Understanding Convolutional Networks; Rob Fergus, Matthew Zeiler
Why run prod (“inference”) on CPU instead of GPU?
b/c in prod it is unlikely that you want to do many many things at one time
Why does train loss < valid loss not mean overfitting?
As long as the error rate continues to decrease in a train step, YOU ARE NOT OVERFITTING
what does the doc
function do?
This is a fastai function that shows the html docs
What do you do with unbalanced data?
Jeremy says nothing. it always works. lol. However you could try to over sample the underrepresented class.
What does _ mean at the end of a pytorch function?
operation happens in place
How do you create a tensor of ones in pytortch?
x = torch.ones(n,2)
How do you make a coliumn in a tensor random?
x[:,0].uniform_()
What is gradient descent in pytorch?
a = nn.Parameter(a); a def update(): y_hat = x@a loss = mse(y, y_hat) if t % 10 == 0: print(loss) loss.backward() with torch.no_grad(): a.sub_(lr * a.grad) a.grad.zero_() lr = 1e-1 for t in range(100): update()