CNN Flashcards
Name four components that are typically contained in a block of CNN.
Convolutional Layer: Performs feature extraction using filters (kernels).
Activation Function: Introduces non-linearity (e.g., ReLU).
Pooling Layer: Downsamples the feature maps to reduce spatial dimensions.
Batch Normalization Layer: Normalizes activations to stabilize training
2 advantages of a conv layer compared to a fully connected layer
More flexibility in learning. Enables high-dimensional inputs such as image data
Explain what a feature map is.
A feature map is a 2D representation of learned features extracted from the input data by convolutional filters in CNN.
Name two advantages of deep networks compared to a network with one wide layer
The multiple layers in deep neural networks allow models to become more efficient at learning complex features and performing more intensive computational tasks
Why are pooling layers in NN used? Pls name at least 2 reasons. How many learnable params does max-pooling layer have?
Pooling layers are used to reduce the dimensions of the feature maps (Downsampling) and create spatial invariance by summarizing local features, enabling the network to recognize patterns irrespective of their exact location in the input. A Max-pooling layer has no learnable parameters
Name two pooling methods
Max-pooling (Take the maximum value from the kernel)
Average (Sum of the values from the kernel)
Name one use-case of 1x1 convolutions
One use-case of 1x1 convolutions is in the “bottleneck layer” (e.g in Inception from GoogleNet)
Draw and explain the Google Inception Module
architecture that uses parallel convolutions of different sizes (1x1, 3x3, 5x5,max_pooling) to capture multi-scale features efficiently, reducing computational complexity using the bottle neck (1x1 layers) to reduce the number of parameters
Draw and label the naive inception module and the inception module with dimension reduction from the GoogleNet architecture.
Naive: Input ->(1x1; 3x3; 5x5; Max_pooling ) -> Output
With bottle neck: nput ->(1x1;1x1 + 3x3; 1x1 + 5x5; Max_pooling + 1x1) -> Output
Name a network architecture or building block which leverages 1x1convolutions
Incepction, from GoogleNet
Explain how the inception module is modified for the I3D architecture. Name one advantage that emerges from this
In the I3D architecture, the Inception module is modified by adding 3D convolutions to capture spatio-temporal information from video frames. One advantage is that allows to recognize actions and activities in videos more effectively
What were the changes in VGG compared to AlexNet?
the use of deeper convolutional layers with smaller 3x3 filters and a more uniform architecture with multiple stacked convolutional layers, leading to improved accuracy and better generalization.
Draw and explain a basic ResNetbuilding block. Name 2 advantages of Resnet blocks over serial conv blocks
A ResNet block, consists of an identity shortcut connection and a stack of convolutional layers. The residual block adds the original input to the output of the convolutional layers, enabling the network to learn the residual (difference between input and output). This mechanism facilitates the training and enable to work with much deeper neural networks.
What are residual block? What are they good for
Residual blocks are building blocks that use shortcut connections to learn the residual (difference) between input and output. This mechanism facilitates the training and enable to work with much deeper neural networks.
Name three properties of the ResNet architecture that were not present in the AlexNet architecture.
Shortcut Connections, Deeper Architecture, Residual Learning