Dropout Flashcards

Question 1

Q

Overview

What is dropout in machine learning?

Answer

A

Dropout is a regularization technique used to prevent overfitting by reducing the reliance on individual neurons (Makes the network suffer)

Question 2

Q

Overview

What does dropout do during training?

Answer

A

During training, dropout randomly sets a fraction of the neurons’ outputs to zero.

Question 3

Q

Overview

What benefit does dropout provide in terms of features?

Answer

A

Dropout encourages the learning of more robust and generalizable features

Question 4

Q

Overview

What happens to the dropout layer during inference?

Answer

A

During inference, the dropout layer is deactivated.

Question 5

Q

Overview

How are neuron outputs affected during inference in dropout?

Answer

A

During inference, all neurons are used, but their outputs are scaled down by the same factor as during training.

e.g. if 2 out of 3 were used in training then the activation learned weights for 2 inputs not 3. By adjusting each input by 2/3, it means that the 3 inputs will still have the same magnitude as was used during training.

Question 6

Q

Forward phase of dropout

What is the effect of output masking in dropout during forward propagation?

Answer

A

The outputs of the selected neurons are set to zero, effectively dropping them out from the network for that particular forward pass.

Question 7

Q

Forward phase of dropout

What happens to the modified outputs from non-dropped out neurons during forward propagation?

Answer

A

The modified outputs from the non-dropped out neurons are propagated forward to the next layer in the network.

Question 8

Q

Forward phase of dropout

What happens to the dropped neurons on the next iteration?

Answer

A

A new set of neurons will be selected across the network according to a probability. Each time a different subset is used.

Question 9

Q

backward phase of dropout

What happens to the gradient/weights of the dropped neurons during backpropagation

Answer

A

The backward pass starts with the modified outputs from the previous layer. Since some neurons were dropped out during the forward pass, their gradients are also set to zero.

(Gradient Masking)

Question 10

Q

Why is extra scaling necessary in the forward pass during dropout?

Answer

A

To ensure that the expected contribution of each neuron remains consistent between training and inference phases.

e.g. if drop out caused 1 out of 3 neurons not to fire during training. (if each neuron has a max weight of 1), The activation function is expecting a max 2. Since at any given point only two neurons would enter the activation function. Now when 3 neurons are all being used during inference, our activation function could end up with a max weight of 3. To keep the magnitude the same, the weights must be adjusted by 2/3 (called the keep probability.) ,

Question 11

Q

What is the concept of inverted dropout during training?

Answer

A

All nodes that were not dropped out are scaled up by the inverse of the dropout rate.

This increase means that if 1 out of 3 neurons is dropped out, we multiply the weights of the remaining 2 by 3/2. So if each neuron had a max weight of 1 before, it now has a weight of 1.5. The activation function then learns off of a magnitude of 3.

At inference time when all three neurons are in use (no dropout) the input doesn’t need scaling since the activation and adjustments learned values in a magnitude of 3 (the non dropped out magnitude)

Question 12

Q

Why is scaling applied during training in inverted dropout?

Answer

A

To compensate for dropped neurons and maintain the total contribution of remaining neurons similar to no dropout.

Question 13

Q

How does the network behave at inference time in inverted dropout?

Answer

A

It works the same as if dropout wasn’t present, requiring no scaling.

Question 14

Q

What problem arises when applying standard dropout to each activation of a convolutional feature map before a 1 × 1 convolution layer?

Answer

A

It leads to increased training time without effectively preventing overfitting, mainly due to spatial correlation among feature map activations in fully convolutional networks.

Question 15

Q

With CNN what happens to the gradient contribution of certain neurons when dropout is applied in a neural network?

Answer

A

Some neurons may have zero contributions due to dropout, but in an image others will still exist and despite the holes dropout created the remaining “pixels” still have enough information to over come the missing data. (Strong correlation of pixels.)

Question 16

Q

In CNN how does dropout affect the independence of neurons in a network?

Answer

Study These Flashcards

A

The overall learning rate is scaled by the dropout probability, indicating a reduction in the magnitude of updates, the independence or lack of interdependence among neurons is not improved

Question 17

Q

Why do some argue against using dropout in convolutional layers?

Answer

Study These Flashcards

A

Convolutional layers have fewer parameters and are less likely to overfit, and dropout may slow down training by affecting the gradient updates for the weights of convolutional layers, which are the average of all gradients from convolutions.

Question 18

Q

What is the key difference between SpatialDropout and standard dropout?

Answer

Study These Flashcards

A

SpatialDropout performs dropout trials on the entire feature tensor as a whole, maintaining spatial coherency, while standard dropout acts independently on each activation.

Question 19

Q

How does SpatialDropout affect adjacent pixels in the dropped-out feature map?

Answer

Study These Flashcards

A

In SpatialDropout, adjacent pixels in the dropped-out feature map are either all zero or all active, preserving spatial relationships.

Question 20

Q

What is Droplayer (StochasticDepth)?

Answer

Study These Flashcards

A

Droplayer is a regularization technique that randomly skips entire layers during training instead of dropping out individual neurons, improving the robustness and reducing overfitting in deep networks like ResNets.

Question 21

Q

What is DropBlock?

Answer

Study These Flashcards

A

DropBlock is a regularization technique that randomly masks out contiguous blocks of activations within a feature map, promoting diverse feature learning and spatial generalization in CNNs while preventing overfitting.

Question 22

Q

How do Droplayer and DropBlock differ from traditional dropout?

Answer

Study These Flashcards

A

Droplayer skips entire layers during training, while DropBlock masks spatial regions within feature maps, in contrast to traditional dropout that targets individual neurons.

Dropout Flashcards

(22 cards)