ConvNeXt Flashcards

Question 1

Q

Explain in a few words the algorithm of MixUp augmentation

Answer

A

The user initiates the parameter alpha

Each time in training:
Takes 2 random images from the dataset
Randomise lambda from the Beta distribution with param alpha.
mixing the images and the labels with the lambda parameter

Question 2

Q

Give the equation for mixing the images and the labels once lambda is drawn

Answer

A

lambdax_1 + (1-lambda)x_2

Question 3

Q

In MixUp paper - what is the range of alpha values they recommend

Answer

A

alpha between 0.1 to 0.4
For bigger alpha values they found underfitting
They didn’t mention for smaller alpha values

Question 4

Q

Explain the need for label smoothing in multiclass problem

Answer

A

In a multiclass problem, people usually use the cross entropy loss.
When a model uses cross-entropy loss to learn it creates a model that sometimes can be ‘overconfident’

Question 5

Q

Explain how in label smoothing the model is not becoming ‘overconfident’

Answer

A

In label smoothing, the target probability has decreased to 0.9 for example (can be more) and the model would want the unnormalised logits to be closer to each other

Question 6

Q

What are unnormalised logits

Answer

A

logits that are the last layer of the neural network

Question 7

Q

How one normalises the logits

Answer

A

Softmax on the logits

Question 8

Q

What is the difference between a Convolution operator to a Depth-wise convolution operator

Answer

A

That a convolution with f filters will have all the filters being performed on all the channels (& all the image).
In Depth-wise convolution, we have to have f=c which c is the number of channels. Then each filter only works on 1 channel.

Question 9

Q

What is the problem with the normal convolution in a computational sense

Answer

A

Each kernel looks at all the channels, and it was shown to be redundant. Plus, the computations amount goes up quadratically to the input channel size.

Question 10

Q

Describe the depthwise separable convolution

Answer

A

First take a depthwise convolution - which is to have the same amount of filters as channels, and each filter looks at only 1 channel.
Then take pointwise convolution - which is to have a 1x1 convolution that goes through all the channels.

Question 11

Q

What is the computational cost of a normal convolution

Answer

A

h * w * c_in * c_out * k^2

Question 12

Q

What is the computational cost of a separable depthwise convolution

Answer

A

h * w * c_input * (c_output + k^2)

Question 13

Q

What is the difference in I/O between normal convolution and separable depthwise convolution with expansion layer

Answer

A

I/O of convolution is h * w * c^2 + k^2 * c^2
I/O of separable depthwise w expansion is:
h * w * 14 * c +12 * c^2 + k^2 * c ->
which is roughly 7 times bigger .

Question 14

Q

What are FLOPs

Answer

A

FLoating-point OPerations

Question 15

Q

What are FLOPS

Answer

A

FLoating-point OPerations per Second

Question 16

Q

What happens when reducing the NN size with Depthwise separable convs followed by expansion layers.

Answer

Study These Flashcards

A

We get an increase memory access and increase latency.

Question 17

Q

What is the idea in mobileNet to combat accuracy drop by using DW separable conv instead of normal conv

Answer

Study These Flashcards

A

To add expansion layers - they chose 6 times as the input layer.

Question 18

Q

What is the idea that PConv is using

Answer

Study These Flashcards

A

There is spatial redundancy in the channel domain which then PConv can only work on 1/4 of the channels and still not lose spatial information

Question 19

Q

How does PConv is working

Answer

Study These Flashcards

A

It is a normal 2d conv that is working on only part of the channels.

Question 20

Q

What is the block of FasterNet

Answer

Study These Flashcards

A

A PConv followed by 2 PWConvs for expansion and reduction.

Question 21

Q

What were their reasons for putting BN and ‘activation’ only once in each block

Answer

Study These Flashcards

A

Overuse of activations was shown in MobileNet to have destroyed information in the low-dimensional space. Plus choosing BN cause it can be combined with a conv2d.

ConvNeXt Flashcards

(21 cards)