From Object Detection to Instance Segmentation Flashcards
What are the levels of computer vision tasks?
Low-level (e.g., edge detection), mid-level (e.g., segmentation), and high-level tasks (e.g., object detection).
What is the difference between semantic and instance segmentation?
Semantic segmentation assigns class labels to each pixel, while instance segmentation also distinguishes between different objects of the same class.
What is the goal of R-CNN?
To detect and classify objects in an image by drawing bounding boxes and assigning labels.
What are the steps involved in R-CNN?
Generate region proposals (Selective Search)
Warp proposals
Extract features using CNN
Classify using SVM
Refine bounding boxes
What are the main drawbacks of R-CNN?
It’s computationally expensive due to 2000 forward passes per image, and Selective Search is fixed and non-learnable.
How does Fast R-CNN improve on R-CNN?
It runs the CNN once per image to create a feature map, then applies RoI pooling for region proposals, speeding up the process.
What is RoI Pooling?
A method to extract fixed-sized feature maps from arbitrary regions in the image.
What are the advantages and disadvantages of Fast R-CNN?
Pros: Faster than R-CNN by sharing computations.
Cons: Still relies on slow Selective Search for proposals.
What is the innovation in Faster R-CNN?
It introduces a Region Proposal Network (RPN) to generate region proposals directly from feature maps.
How does the RPN work?
It slides over the feature map, generating anchors of different scales and aspect ratios, and scores them based on overlap with ground truth.
What happens after the RPN stage?
RoI pooling extracts features for each proposal, and R-CNN classifies them into object classes.
What is U-Net?
A convolutional neural network architecture for semantic segmentation that combines downsampling and upsampling paths.
What are the benefits of U-Net?
It preserves location information and supports variable-sized inputs due to the absence of dense layers.
What is Mask R-CNN?
An extension of Faster R-CNN that adds a mask prediction branch for each RoI, enabling pixel-level segmentation of object instances.
What is the output of Mask R-CNN?
Bounding boxes, class labels, and binary masks for each object instance.