Understanding CNNs Flashcards
What is a CNN?
A type of deep neural network designed for visual data, inspired by the human visual system.
Key Applications of CNNs
Used in facial recognition, object detection, medical imaging, self-driving cars, etc.
What is Convolution?
A mathematical operation applying a small matrix (kernel) to an image to extract features.
What do CNN filters detect?
Edges, textures, and objects.
How does convolution work?
A kernel slides over an image, performing element-wise multiplication and summation.
What is stride in CNNs?
The step size by which the kernel moves across the input image.
What is padding in CNNs?
padding=’valid’: No padding, reducing output size.
padding=’same’: Pads input to keep the output size the same.
Layers in a CNN
Convolutional Layer: Extracts feature maps from images.
Pooling Layer: Reduces spatial size (downsampling).
Activation Functions: Introduces non-linearity.
Fully Connected Layer: Final classification.
Pooling in CNNs
Purpose: Reduces dimensionality while preserving important features.
Types: Max pooling (common) and average pooling.
Fully Connected Layer
Flattens the feature map into a vector for classification.
Uses activation functions like softmax.
How do CNNs learn?
They process images with filters to extract patterns.
Use loss function to measure prediction error.
Use backpropagation to adjust filter weights.
Use optimization algorithms like SGD and Adam.
What is Backpropagation?
A method to compute the gradient of the loss function and adjust weights accordingly.
Example: A Simple CNN
Input: 32×32 image Conv Layer 1: 3×3 kernel, 16 filters → (30×30×16) Conv Layer 2: 3×3 kernel, 32 filters → (28×28×32) Fully Connected Layer: 25,088 input units → 10 output classes.
Find the total parameters calculation
Structure
Conv Layer 1: 144 learnable parameters.
Conv Layer 2: 4,608 learnable parameters.
Fully Connected Layer: 250,880 learnable parameters.
Total = 255,632 parameters.
Who developed LeNet?
Yann LeCun.
What was LeNet used for?
Recognizing handwritten MNIST digits.
Key Features of LeNet
Alternates between convolution and max-pooling layers.
Uses local receptive fields with shared weights.
Ends with fully connected layers and softmax.
Why is Batch Normalization needed?
Prevents vanishing/exploding gradients.
Stabilizes training.
Reduces sensitivity to weight initialization.
How does Batch Normalization work?
Normalizes each layer’s output.
Acts as a regularizer, reducing overfitting.
How to improve CNN generalization?
Spatial Dropout: Prevents over-reliance on certain features.
Hyperparameter optimization: Adjusting learning rates, batch sizes, etc.
Key similarities and differences between backpropagation and regularization
Backpropagation aims to minimize loss by updating weights based on error gradients.
Regularization modifies the weight update process to improve generalization and prevent overfitting.
Both affect weight updates.
Both are iterative processes.
Regularization can be incorporated into backpropagation (e.g., L2 regularization adds a penalty term to the weight updates).
Feature map size formula
(input size/kernel size)/stride + 1
Pooling size formula
input size/kernel size