Lesson 1-3 Flashcards
Is ML a black box?
No. Interpretable ML Visualize gradients and activatuions
Does deep learning need to much data?
No. Transfer learning; share/use pre-trained nets
What does a Union[…] mean in a function signature?
One of the items
Why do you need to make all the images the same shape and size?
Because in order for the GPU to work fast with them, it has to be that way!
What size usually works?
square with size=224 :-)
What are the resnets?
Pre-trained, 34 (#layers) and 50. Different sizes. Start with the smaller one. Trained on 1.5mm imagenet pictures. Has pre-trained weigths. Start with a model that knows how to recognize 1000 categories.
What is transfer learning?
Take a model that knows how to do something well, and make it learn how to do your thing REALLY well. Train with 1/100th or less (maybe thousands of times) of data and time.
What is one cycle learning?
Better, faster; recent paper (TODO: look this up)
What does unfreeze do?
Without unfreeze, the fitting is done on the final layers only, leaving the initial pre-trained layers untouched. This makes it very fast and avoids overfitting. Unfreeze means fit all layers,.
Paper on Understanding CNN
Visualizing and Understanding Convolutional Networks; Rob Fergus, Matthew Zeiler
Why run prod (“inference”) on CPU instead of GPU?
b/c in prod it is unlikely that you want to do many many things at one time
Why does train loss < valid loss not mean overfitting?
As long as the error rate continues to decrease in a train step, YOU ARE NOT OVERFITTING
what does the doc
function do?
This is a fastai function that shows the html docs
What do you do with unbalanced data?
Jeremy says nothing. it always works. lol. However you could try to over sample the underrepresented class.
What does _ mean at the end of a pytorch function?
operation happens in place
How do you create a tensor of ones in pytortch?
x = torch.ones(n,2)
How do you make a coliumn in a tensor random?
x[:,0].uniform_()
What is gradient descent in pytorch?
a = nn.Parameter(a); a def update(): y_hat = x@a loss = mse(y, y_hat) if t % 10 == 0: print(loss) loss.backward() with torch.no_grad(): a.sub_(lr * a.grad) a.grad.zero_() lr = 1e-1 for t in range(100): update()
What is the differnce between gradient descent and sgd?
sgd is done on mini-batches… instead of running on the entire dataset, we choose a batch of data at random (randomize without replacement; you will see all images)
How do we make sure we don’t overfit?
Not with parsimonious models! With regularization.
How do we make sure we don’t overfit?
Not with parsimonious models! With regularization. Use a validation set!
How does Python typing work?
def greeting(name: str) -> str: return ‘Hello ‘ + name
How does Python typing work?
def greeting(name: str) -> str: return ‘Hello ‘ + name
What is camvid
Segmentation mask, labeled dataset