Ch5 Image Classification End-of-Chapter Questions Flashcards
Why do we first resize images to a large size on the CPU, and then to a smaller size on the GPU?
We want to first get all the images to be the same size so that we can put them in tensors and pass them to the GPU, but we don’t want to lose much information so we resize to a large size to keep as much of the image as possible.
This resizing outputs images large enough to allow us to have a spare margin after applying transforms and doing the final resize. This avoids the creation of empty zones, which would not teach the model anything, when we apply transforms.
Example: A transform rotating an image by 45 degrees would fill corner regions of the new bounds with empty space.
What are two ways in which data is most commonly provided for most deep learning datasets?
- A table of data, e.g. csv file
- Items of data in individual files that could be organized into folders or with filenames that describe those items.
Give two examples of ways that image transformations can degrade the quality of the data.
We can lose resolution after resizing and we can introduce empty zones by rotating after reducing the image to final size.
What method does fastai provide to view the data in DataLoaders?
show_batch method
dls.show_batch(nrows=2, ncols=3)
What method does fastai provide to help you debug a DataBlock?
summary method
dblk.summary(‘filepath’)
Should you hold off on training a model until you have thoroughly cleaned your data?
No, you should train a model as soon as you can because the incorrect predictions from the model can help you clean the data more efficiently.
What are the two pieces that are combined into cross-entropy loss in PyTorch?
Softmax and log likelihood
What are two properties of activations that softmax ensures? Why is this important?
It ensures that activations are represented as a number from 0 to 1 and that the sum of activations from all categories add up to 1. The raw activation values don’t have meaning by themselves – they represent the relative confidence of an input being in category1 vs category2. What we care about is which activation is higher and by how much. We get this when all activations add up to 1. Then we can think of it as the probability of being in that category.
The second property is that if one of the numbers in our activations is slightly bigger than the others, the exponential in softmax will amplify this difference, which is useful when we really want the classifier to pick one image as the prediction.
When might you want your activations to not have the two properties that softmax ensures?
When you want your model to tell you when it’s not sure of something, e.g. it comes across an image it was not trained on.
Why can’t we use torch.where
to create a loss function for datasets where our label can have more than two categories?
The function we used to calculate loss for a binary target was:
~~~
def mnist_loss(predictions, targets):
return torch.where(targets==1, 1-predictions, predictions).mean()
~~~
In this case torch.where
returns 1-predictions
when target==1
and predictions
otherwise. We would need multiple conditions to handle more than two categories, and we can’t do that with torch.where
, which only takes one condition as an argument.
What is the value of log(-2)? Why?
It is undefined because the log function is not defined for numbers less than 0.
What are two good rules of thumb for picking a learning rate from the learning rate finder?
- Find the minimum loss and use a learning rate that is one magnitude less than the one at the min, i.e. divide it by 10.
- The last point where the loss was clearly decreasing (subjective). Could use the point where derivative is steepest.
What two steps does the fine_tune method do?
- Trains the randomly added layers for one epoch, with all other layers frozen
- Unfreezes all the layers, and trains them for the number of epochs requested.
What are discriminative learning rates?
Learning rates that are different depending on the depth of the layer. Use a lower learning rate for the early layers of the neural network and a higher learning rate for the later layers, particularly the randomly added layers.
How is a Python slice object interpreted when passed as a learning rate to fastai?
The first value passed will the learning rate for the earliest layer of the neural network and the second value will be the learning rate in the final layer. The layers in between will have learning rates that are multiplicatively equidistant throughout that range.