lecture 7 - CNNs Flashcards
What is image classification in machine learning?
- Image classification involves analyzing an image (represented as pixel data with color values) to assign it a label.
- Example: determining whether a picture contains a cat or not.
What is the typical data structure for an image in image classification?
- Each image is represented as a 3D array
- dimensions height × width × number of color channels
- e.g., a 64×64 image with 3 color channels results in 64×64×3 datapoints per image
What is object detection, and how does it differ from image classification?
- Object detection involves identifying objects within an image, determining how many objects are present, and specifying their locations.
- Image classification, it provides labels
What is neural style transfer?
Neural style transfer combines a source image with a style from another image, transferring the style onto the source image while preserving its content.
How do larger images affect normal neural network design?
Larger images have more datapoints, leading to a greater number of weights in the network. This increases computational complexity and memory requirements.
Why are regular neural networks not ideal for image classification tasks with large images?
- Regular neural networks require a very high number of weights for large images, making them computationally infeasible.
- Specialized architectures, like convolutional neural networks (CNNs), are used to manage this complexity.
What is edge detection in image processing?
- Edge detection identifies areas in an image where the intensity (brightness) changes sharply.
- These areas often represent object boundaries or transitions from one color to another.
How does a filter (kernel) work in edge detection?
- A filter scans across the image by sliding over the input grid and performs a convolution operation to compute an output.
- This operation detects changes in intensity, indicating edges.
If you apply a 3×3 filter on a 6×6 image, what will be the size of the output?
- The output will be a 4×4 matrix
- applying a filter reduces the output dimensions by the size of the filter minus one.
Does the orientation of the input image affect the result of edge detection?
No, flipping the input image around does not change the filter’s ability to detect edges, as the convolution operation remains consistent.
What is the mathematical operation performed during convolution?
The filter and a corresponding section of the input image are multiplied element-wise, and the resulting values are summed to produce one output value.
What is the primary purpose of using convolution in edge detection?
Convolution helps extract meaningful patterns, such as edges, from the input image, facilitating feature extraction in downstream computer vision tasks.
What does a vertical filter detect in an image?
- A vertical filter detects vertical edges by emphasizing intensity differences between the left and right sides of the filter.
- If one side is much brighter, an edge is detected.
How does a horizontal filter detect edges?
- A horizontal filter detects horizontal edges by comparing brightness between the top and bottom parts of the filter.
- It is the transposed version of a vertical filter.
What is a Sobel filter, and why is it useful?
- A Sobel filter is an advanced edge detection filter that gives more importance to the center of the image section being analyzed.
- It works well for detecting faint edges.
What is the purpose of a Scharr filter?
A Scharr filter is a fine-tuned version of the Sobel filter that detects edges with even greater precision.
What are the two main advantages of using convolution in image processing?
- Parameter/Weight sharing: The filter size is fixed, reducing the number of weights significantly.
- Local information: Convolution captures local patterns by taking into account the spatial relationship of neighboring pixels.
Why is the filter size considered a hyperparameter in convolutional neural networks?
- The filter size determines the receptive field and affects the output size.
- It is typically chosen as an odd number (e.g., 3×3 or 5×5) to ensure proper centering.
What is the purpose of padding in convolutional neural networks?
Padding prevents the output from shrinking after each convolution, enabling the network to go deeper while preserving the original image size.
How does padding help in edge detection tasks?
Padding ensures that pixels on the borders of the image are used as frequently as those in the center, allowing the network to accurately detect information near the edges.
How is padding typically applied to an image?
Padding adds extra rows and columns (usually filled with zeros) around the original image to maintain the desired output size.
What is the difference in output size between convolution with and without padding?
- without padding: [n-f+1] x [n-f+1]
- with padding: [n+2p-f+1] x [n+2p-f+1]
- The output size remains the same as the input if the padding is chosen appropriately.
What are the two main benefits of using padding in CNNs?
- It allows the filter to operate at the edges, ensuring that all pixels are considered equally.
- It maintains the image size after convolution, making it easier to design deeper networks.
How is the required padding size determined for padding?
- p = (f-1)/2
- f = filter size
- this will give an nxn output
- This formula works best when the filter size is an odd number, ensuring an integer padding value.
What is the primary output difference between valid convolution and same convolution?
- Valid convolution (no padding): The output size is smaller than the input.
- Same convolution (zero padding): The output size is the same as the input due to zero padding.
What is stride in the context of convolutional neural networks?
- Stride refers to how much the filter moves across the input image during convolution.
- It determines the step size of the filter movement both horizontally and vertically.
How does stride affect computation and time in convolutional
- Increasing the stride reduces the number of computations by skipping pixels, which saves time and computational power.
- This is particularly effective for high-resolution images.
What happens when the stride is set to 1 in convolution?
When the stride is 1, the filter moves one pixel at a time in both horizontal and vertical directions, resulting in maximum overlap between adjacent filter positions.
How is the output size calculated when using stride in convolution?
- The output size is determined by the formula:
- ([n+2p-f]/s) + 1
Why are 3×3 filters commonly used in convolutional neural networks?
- They are the smallest possible filters that can capture information in four directions (up, down, left, right).
- Stacking multiple smaller filters can achieve the same effect as using a larger filter (e.g., 5×5 or 7×7).
What are the benefits of using multiple smaller filters instead of a single large filter?
- Smaller filters lead to deeper networks with more non-linearities, which improves the network’s ability to learn complex patterns.
- They result in fewer parameters, helping with regularization and reducing overfitting.
How are RGB images represented in convolutional neural networks?
- RGB images are represented as three input channels: red, green, and blue.
- Each pixel’s color is determined by its RGB values.
Why can’t RGB channels be treated separately in convolutional operations?
The RGB channels are highly correlated, so they need to be processed together to capture meaningful features across all color channels.
What happens when a filter is applied to an RGB image?
A filter (e.g., 3×3×3) is applied across the spatial dimensions of the RGB image, performing a dot product between the filter weights and the overlapping region of the input to produce a single output value.
What is the result of applying a convolutional filter on an RGB image?
The resulting feature map is a 2D grid (e.g., 4×4 in size, depending on the input size and filter parameters) that summarizes the detected features.
How does the convolution operation proceed across an RGB image?
The filter slides across the image spatially, performing the dot product at each position to compute the corresponding output value in the feature map.
What happens when multiple filters are applied in a convolutional neural network?
- Multiple filters produce a set of output feature maps.
- These output feature maps collectively form a new 3D tensor, which is the input for the next layer.
What does a convolutional neural network learn during training?
The network learns the weights of the filters through backpropagation, enabling it to detect various features in the input data.
How does the size of the image and number of filters change in a CNN as the network gets deeper
- The spatial dimensions of the image decrease over time
- The number of filters (and thus the depth of the output tensor) increases.