Convolutional Neural Networks Flashcards
A convolutional layer consists of filters, what do these filters do?
Each layer is made up of a set of filters - each filter extracts a set of features like edges. The output of the convolutional layers is then a feature map/activation map
Why do many convolutional layers end with ReLu activation functions?
The purpose of activation functions is mainly to add non-linearity to the network, which otherwise would be only a linear model. A convolutional layer by itself is linear exactly like the fully connected layer.
ReLu is great since it transforms the output in a way where it makes backpropagating easier
What is the advantage of multiple convolutional layers?
Convolutional layers detects things in images like patterns through filters. Multiple conv layers then can detect patterns in patterns from the earlier layers.
What does a max pooling layer do?
Max pooling layers takes in an image (often a filtered image) and outputs a version of this image with reduced dimensions but increase of depth as the filters learned earlier causes added depth to the image.
They look at an area and keeps the max value in each area before moving away with the defined stride - reducing the dimensions that way.
What does a Fully connected layer do?
The job of the fully connected layer is to compress the information of its input into a feature vector of the number of classes.
Classes = 100 = 1x100 Classes = 10 = 1x10
What is dropout layers?
Dropout layers essentially turn off certain nodes in a layer with some probability, p. This ensures that all nodes get an equal chance to try and classify different images during training, and it reduces the likelihood that only a few, heavily-weighted nodes will dominate the process.
Why do you end many CNNs with Softmax functions?
Softmax can turn the feature vector received from a FC-layer into a probability range. This probability range can be used to extract the most likely class for the given image
When is cross entropy a good choice for a loss function?
When you are working with classification tasks
Why do you use zero-padding?
Without padding the dimensions of the input image would decrease too quickly and the network would not be able to learn anything
What’s a good rule of thumb for learning rate decay?
decay = alpha_init / epochs