week 6 (cnn cheatsheet) Flashcards
What is the size of the output feature map when applying K filters of size F × F to an input of size I × I with C channels?(Think about how filters and channels interact in convolution layers.)
The output feature map size is O × O × K. Explanation: When K filters of size F × F are applied to an input of size I × I with C channels, the resulting output feature map will have dimensions O × O × K, where O is determined by the input size, filter size, and stride.
What does the stride S represent in a convolutional operation? (Consider how the filter scans the input image.)
The stride S denotes the number of pixels the filter moves after each operation. Explanation: In convolutional and pooling operations, the stride S indicates how many pixels the filter shifts after each application, affecting the size of the output feature map.
How does the size of a filter affect the convolution operation in CNNs? (Remember the relationship between filter size, input size, and output dimensions.)
The size of a filter determines the dimensions of the convolution operation and the resulting output feature map. Explanation: A filter of size F × F applied to an input with C channels creates a volume of size F × F × C, which performs convolutions and influences the dimensions of the output feature map.
What is zero-padding in the context of convolutional neural networks? (Think about how padding affects the dimensions of the input.)
Zero-padding is the process of adding P zeroes to each side of the input boundaries. Explanation: Zero-padding helps maintain the spatial dimensions of the input after convolution, allowing for better feature extraction and avoiding the loss of information at the edges.
What are the three modes of zero-padding?
(Consider how each mode affects the output size of the feature map.)
The three modes of zero-padding are Valid, Same, and Full.
Explanation:
- Valid: No padding, may drop dimensions.
- Same: Padding to keep output size equal to input size.
- Full: Maximum padding, allowing the filter to see the input fully.
How is the output size O of a feature map calculated in a convolutional layer?
(Remember the variables: I for input size, F for filter size, P for padding, and S for stride.)
O = (I - F + Pstart + Pend) / S + 1
Explanation: This formula calculates the output size based on the input size, filter size, padding, and stride, which are all critical hyperparameters in convolutional neural networks.
What is zero-padding in the context of convolutional neural networks?
(Think about how the input size changes after applying a convolutional filter.)
Zero-padding is the process of adding zeros around the border of an input image to control the spatial dimensions of the output feature map after convolution.
Explanation: Zero-padding helps maintain the spatial dimensions of the input when applying convolutional layers. It allows for better control over the output size and can help preserve important features near the edges of the input.
Why is zero-padding used in convolutional neural networks?(Consider the effects of convolution on the edges of an image.)
Zero-padding is used to prevent the reduction of the spatial dimensions of the feature maps and to allow for better feature extraction near the edges of the input.
Explanation: Without zero-padding, applying a convolutional filter would reduce the size of the output feature map, potentially losing important information at the edges. Zero-padding helps retain the original size and improves the network’s ability to learn from edge features.
How does zero-padding affect the number of parameters in a convolutional layer?
(Focus on what parameters are involved in convolutional layers.)
Zero-padding does not directly affect the number of parameters in a convolutional layer; it only influences the output size of the feature maps. Explanation: The number of parameters in a convolutional layer is determined by the filter size, number of filters, and the number of input channels, not by zero-padding. However, zero-padding can influence the overall architecture and performance of the network.
What is the purpose of the ReLU activation function in deep learning?(Think about how it affects the output when the input is negative or positive.)
To introduce non-linearity into the model and help it learn complex patterns.
Explanation: ReLU (Rectified Linear Unit) is defined as g(z) = max(0, z), which means it outputs zero for any negative input and the input itself for positive values. This allows the model to learn complex relationships in the data by introducing non-linearity.
What is the difference between image classification and object detection in deep learning?(Consider what additional information is provided in object detection compared to classification.)
Image classification predicts the presence of an object, while object detection identifies the object and its location in the image. Explanation: Image classification assigns a label to an entire image, predicting the probability of the object being present. In contrast, object detection not only predicts the object but also provides bounding boxes indicating where the object is located within the image.
What are the two main types of detection methods in object detection?(Think about what each method focuses on detecting in an image.)
Bounding box detection and landmark detection. Explanation: Bounding box detection identifies the location of objects using rectangular boxes, while landmark detection focuses on identifying specific points or features of complex shapes within the image.
What is One Shot Learning in the context of image verification?(Think about how it relates to learning from few examples.)
One Shot Learning is an algorithm that learns a similarity function using a limited training set to compare images.
Explanation: One Shot Learning allows a model to verify if two images are of the same person by learning from only one example of each class, making it efficient for tasks like face verification.
What is the purpose of a Siamese Network?
(Consider how it processes input images.)
A Siamese Network encodes images to quantify the differences between them.
Explanation: Siamese Networks are designed to take two input images and produce a representation that helps in determining how similar or different they are, which is crucial for tasks like face recognition.
How is the triplet loss function defined in image embedding?(Remember the roles of anchor, positive, and negative images.)
The triplet loss function is defined as ℓ(A,P,N) = max(d(A,P) - d(A,N) + α, 0).
Explanation: The triplet loss function helps train models by ensuring that the distance between an anchor and a positive image (same class) is smaller than the distance between the anchor and a negative image (different class) by at least a margin α.