week 6 (cnn cheatsheet) Flashcards

Question 1

Q

What is the size of the output feature map when applying K filters of size F × F to an input of size I × I with C channels?(Think about how filters and channels interact in convolution layers.)

Answer

A

The output feature map size is O × O × K. Explanation: When K filters of size F × F are applied to an input of size I × I with C channels, the resulting output feature map will have dimensions O × O × K, where O is determined by the input size, filter size, and stride.

Question 2

Q

What does the stride S represent in a convolutional operation? (Consider how the filter scans the input image.)

Answer

A

The stride S denotes the number of pixels the filter moves after each operation. Explanation: In convolutional and pooling operations, the stride S indicates how many pixels the filter shifts after each application, affecting the size of the output feature map.

Question 3

Q

How does the size of a filter affect the convolution operation in CNNs? (Remember the relationship between filter size, input size, and output dimensions.)

Answer

A

The size of a filter determines the dimensions of the convolution operation and the resulting output feature map. Explanation: A filter of size F × F applied to an input with C channels creates a volume of size F × F × C, which performs convolutions and influences the dimensions of the output feature map.

Question 4

Q

What is zero-padding in the context of convolutional neural networks? (Think about how padding affects the dimensions of the input.)

Answer

A

Zero-padding is the process of adding P zeroes to each side of the input boundaries. Explanation: Zero-padding helps maintain the spatial dimensions of the input after convolution, allowing for better feature extraction and avoiding the loss of information at the edges.

Question 5

Q

What are the three modes of zero-padding?
(Consider how each mode affects the output size of the feature map.)

Answer

A

The three modes of zero-padding are Valid, Same, and Full.
Explanation:
- Valid: No padding, may drop dimensions.
- Same: Padding to keep output size equal to input size.
- Full: Maximum padding, allowing the filter to see the input fully.

Question 6

Q

How is the output size O of a feature map calculated in a convolutional layer?
(Remember the variables: I for input size, F for filter size, P for padding, and S for stride.)

Answer

A

O = (I - F + Pstart + Pend) / S + 1
Explanation: This formula calculates the output size based on the input size, filter size, padding, and stride, which are all critical hyperparameters in convolutional neural networks.

Question 7

Q

What is zero-padding in the context of convolutional neural networks?
(Think about how the input size changes after applying a convolutional filter.)

Answer

A

Zero-padding is the process of adding zeros around the border of an input image to control the spatial dimensions of the output feature map after convolution.
Explanation: Zero-padding helps maintain the spatial dimensions of the input when applying convolutional layers. It allows for better control over the output size and can help preserve important features near the edges of the input.

Question 8

Q

Why is zero-padding used in convolutional neural networks?(Consider the effects of convolution on the edges of an image.)

Answer

A

Zero-padding is used to prevent the reduction of the spatial dimensions of the feature maps and to allow for better feature extraction near the edges of the input.
Explanation: Without zero-padding, applying a convolutional filter would reduce the size of the output feature map, potentially losing important information at the edges. Zero-padding helps retain the original size and improves the network’s ability to learn from edge features.

Question 9

Q

How does zero-padding affect the number of parameters in a convolutional layer?
(Focus on what parameters are involved in convolutional layers.)

Answer

A

Zero-padding does not directly affect the number of parameters in a convolutional layer; it only influences the output size of the feature maps. Explanation: The number of parameters in a convolutional layer is determined by the filter size, number of filters, and the number of input channels, not by zero-padding. However, zero-padding can influence the overall architecture and performance of the network.

Question 10

Q

What is the purpose of the ReLU activation function in deep learning?(Think about how it affects the output when the input is negative or positive.)

Answer

A

To introduce non-linearity into the model and help it learn complex patterns.
Explanation: ReLU (Rectified Linear Unit) is defined as g(z) = max(0, z), which means it outputs zero for any negative input and the input itself for positive values. This allows the model to learn complex relationships in the data by introducing non-linearity.

Question 11

Q

What is the difference between image classification and object detection in deep learning?(Consider what additional information is provided in object detection compared to classification.)

Answer

A

Image classification predicts the presence of an object, while object detection identifies the object and its location in the image. Explanation: Image classification assigns a label to an entire image, predicting the probability of the object being present. In contrast, object detection not only predicts the object but also provides bounding boxes indicating where the object is located within the image.

Question 12

Q

What are the two main types of detection methods in object detection?(Think about what each method focuses on detecting in an image.)

Answer

A

Bounding box detection and landmark detection. Explanation: Bounding box detection identifies the location of objects using rectangular boxes, while landmark detection focuses on identifying specific points or features of complex shapes within the image.

Question 13

Q

What is One Shot Learning in the context of image verification?(Think about how it relates to learning from few examples.)

Answer

A

One Shot Learning is an algorithm that learns a similarity function using a limited training set to compare images.
Explanation: One Shot Learning allows a model to verify if two images are of the same person by learning from only one example of each class, making it efficient for tasks like face verification.

Question 14

Q

What is the purpose of a Siamese Network?
(Consider how it processes input images.)

Answer

A

A Siamese Network encodes images to quantify the differences between them.
Explanation: Siamese Networks are designed to take two input images and produce a representation that helps in determining how similar or different they are, which is crucial for tasks like face recognition.

Question 15

Q

How is the triplet loss function defined in image embedding?(Remember the roles of anchor, positive, and negative images.)

Answer

A

The triplet loss function is defined as ℓ(A,P,N) = max(d(A,P) - d(A,N) + α, 0).
Explanation: The triplet loss function helps train models by ensuring that the distance between an anchor and a positive image (same class) is smaller than the distance between the anchor and a negative image (different class) by at least a margin α.

Question 16

Q

What is the content cost function Jcontent(C,G) used for in deep learning?(Think about how we compare two images in terms of their features.)

Answer

A

It measures how the generated image G differs from the original content image C. Explanation: The content cost function Jcontent(C,G) quantifies the difference between the activations of the original content image C and the generated image G at a specific layer l, helping to evaluate how well the generated image retains the content of the original.

Question 17

Q

How is the style matrix G[l] defined in the context of deep learning? (Consider how we measure relationships between different features in an image.）

Answer

A

It is a Gram matrix that quantifies the correlation between different channels in the activations of a given layer l. Explanation: The style matrix G[l] captures the relationships between different feature channels in the activations of a neural network layer, allowing us to analyze the style of images by comparing these correlations.

Question 18

Q

What is the overall cost function J(G) in the context of image generation? (Think about how we balance different aspects of image generation.)

Answer

A

It combines the content and style cost functions, weighted by parameters α and β.
Explanation: The overall cost function J(G) integrates both the content and style losses to guide the image generation process, ensuring that the generated image maintains both the desired content and style.

Question 19

Q

What is the role of α and β in the neural style transfer equation J(G) = αJcontent(C,G) + βJstyle(S,G)? (Think about how changing these values affects the output image.)

Answer

A

α controls the importance of content, while β controls the importance of style.
Explanation: In neural style transfer, α and β are weights that determine how much the generated image should prioritize content versus style. A higher α means the output will resemble the content image more closely, while a higher β means the output will focus more on the style of the style image.

Question 20

Q

What is a Generative Adversarial Network (GAN)? (Consider the two roles within a GAN and their interaction.)

Answer

A

A GAN consists of a generative model that creates data and a discriminative model that evaluates it.
Explanation: A GAN is a framework that includes two neural networks: the generator, which creates new data instances, and the discriminator, which evaluates them against real data. This adversarial process helps improve the quality of generated outputs, making them more realistic.

Question 21

Q

How does the Residual Network (ResNet) architecture help in training deep neural networks? (Think about how adding shortcuts can affect learning in deep networks.)

Answer

A

ResNet uses residual blocks to reduce training error by allowing gradients to flow through the network more effectively.
Explanation: The ResNet architecture introduces residual connections that skip one or more layers, which helps mitigate the vanishing gradient problem in deep networks. This allows for training much deeper networks without significant loss in performance.

Question 22

Q

What are Generative Adversarial Networks (GANs)? (Think about how two models interact in GANs.)

Answer

A

GANs are composed of a generative model and a discriminative model that work together to generate realistic outputs.
Explanation: Generative Adversarial Networks (GANs) consist of two neural networks: the generator, which creates new data instances, and the discriminator, which evaluates them. The generator aims to produce outputs that are indistinguishable from real data, while the discriminator tries to differentiate between real and generated data.

Question 23

Q

What is the primary function of the generative model in GANs?
(Consider what the generator is trying to achieve in GANs.)

Answer

A

The generative model aims to generate the most truthful output that resembles real data.
Explanation: In GANs, the generative model’s main function is to create data that closely resembles the true data distribution, effectively fooling the discriminator into believing the generated data is real.

Question 24

Q

What are some applications of GANs?
(Think about creative fields where GANs might be applied.)

Answer

A

GANs are used in text-to-image generation, music generation, and synthesis.
Explanation: Generative Adversarial Networks have various applications, including generating images from textual descriptions, creating music, and synthesizing new data that mimics existing datasets.