ConvNeXt Flashcards

1
Q

Explain in a few words the algorithm of MixUp augmentation

A

The user initiates the parameter alpha

Each time in training:
Takes 2 random images from the dataset
Randomise lambda from the Beta distribution with param alpha.
mixing the images and the labels with the lambda parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give the equation for mixing the images and the labels once lambda is drawn

A

lambdax_1 + (1-lambda)x_2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In MixUp paper - what is the range of alpha values they recommend

A

alpha between 0.1 to 0.4
For bigger alpha values they found underfitting
They didn’t mention for smaller alpha values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the need for label smoothing in multiclass problem

A

In a multiclass problem, people usually use the cross entropy loss.
When a model uses cross-entropy loss to learn it creates a model that sometimes can be ‘overconfident’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain how in label smoothing the model is not becoming ‘overconfident’

A

In label smoothing, the target probability has decreased to 0.9 for example (can be more) and the model would want the unnormalised logits to be closer to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are unnormalised logits

A

logits that are the last layer of the neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How one normalises the logits

A

Softmax on the logits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a Convolution operator to a Depth-wise convolution operator

A

That a convolution with f filters will have all the filters being performed on all the channels (& all the image).
In Depth-wise convolution, we have to have f=c which c is the number of channels. Then each filter only works on 1 channel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the problem with the normal convolution in a computational sense

A

Each kernel looks at all the channels, and it was shown to be redundant. Plus, the computations amount goes up quadratically to the input channel size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the depthwise separable convolution

A

First take a depthwise convolution - which is to have the same amount of filters as channels, and each filter looks at only 1 channel.
Then take pointwise convolution - which is to have a 1x1 convolution that goes through all the channels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the computational cost of a normal convolution

A

h * w * c_in * c_out * k^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the computational cost of a separable depthwise convolution

A

h * w * c_input * (c_output + k^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference in I/O between normal convolution and separable depthwise convolution with expansion layer

A

I/O of convolution is h * w * c^2 + k^2 * c^2
I/O of separable depthwise w expansion is:
h * w * 14 * c +12 * c^2 + k^2 * c ->
which is roughly 7 times bigger .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are FLOPs

A

FLoating-point OPerations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are FLOPS

A

FLoating-point OPerations per Second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens when reducing the NN size with Depthwise separable convs followed by expansion layers.

A

We get an increase memory access and increase latency.

16
Q

What is the idea in mobileNet to combat accuracy drop by using DW separable conv instead of normal conv

A

To add expansion layers - they chose 6 times as the input layer.

17
Q

What is the idea that PConv is using

A

There is spatial redundancy in the channel domain which then PConv can only work on 1/4 of the channels and still not lose spatial information

18
Q

How does PConv is working

A

It is a normal 2d conv that is working on only part of the channels.

19
Q

What is the block of FasterNet

A

A PConv followed by 2 PWConvs for expansion and reduction.

20
Q

What were their reasons for putting BN and ‘activation’ only once in each block

A

Overuse of activations was shown in MobileNet to have destroyed information in the low-dimensional space. Plus choosing BN cause it can be combined with a conv2d.