ConvNeXt Flashcards
Explain in a few words the algorithm of MixUp augmentation
The user initiates the parameter alpha
Each time in training:
Takes 2 random images from the dataset
Randomise lambda from the Beta distribution with param alpha.
mixing the images and the labels with the lambda parameter
Give the equation for mixing the images and the labels once lambda is drawn
lambdax_1 + (1-lambda)x_2
In MixUp paper - what is the range of alpha values they recommend
alpha between 0.1 to 0.4
For bigger alpha values they found underfitting
They didn’t mention for smaller alpha values
Explain the need for label smoothing in multiclass problem
In a multiclass problem, people usually use the cross entropy loss.
When a model uses cross-entropy loss to learn it creates a model that sometimes can be ‘overconfident’
Explain how in label smoothing the model is not becoming ‘overconfident’
In label smoothing, the target probability has decreased to 0.9 for example (can be more) and the model would want the unnormalised logits to be closer to each other
What are unnormalised logits
logits that are the last layer of the neural network
How one normalises the logits
Softmax on the logits
What is the difference between a Convolution operator to a Depth-wise convolution operator
That a convolution with f filters will have all the filters being performed on all the channels (& all the image).
In Depth-wise convolution, we have to have f=c which c is the number of channels. Then each filter only works on 1 channel.
What is the problem with the normal convolution in a computational sense
Each kernel looks at all the channels, and it was shown to be redundant. Plus, the computations amount goes up quadratically to the input channel size.
Describe the depthwise separable convolution
First take a depthwise convolution - which is to have the same amount of filters as channels, and each filter looks at only 1 channel.
Then take pointwise convolution - which is to have a 1x1 convolution that goes through all the channels.
What is the computational cost of a normal convolution
h * w * c_in * c_out * k^2
What is the computational cost of a separable depthwise convolution
h * w * c_input * (c_output + k^2)
What is the difference in I/O between normal convolution and separable depthwise convolution with expansion layer
I/O of convolution is h * w * c^2 + k^2 * c^2
I/O of separable depthwise w expansion is:
h * w * 14 * c +12 * c^2 + k^2 * c ->
which is roughly 7 times bigger .
What are FLOPs
FLoating-point OPerations
What are FLOPS
FLoating-point OPerations per Second