From Object Detection to Instance Segmentation Flashcards

Question 1

Q

What are the levels of computer vision tasks?

Answer

A

Low-level (e.g., edge detection), mid-level (e.g., segmentation), and high-level tasks (e.g., object detection).

Question 2

Q

What is the difference between semantic and instance segmentation?

Answer

A

Semantic segmentation assigns class labels to each pixel, while instance segmentation also distinguishes between different objects of the same class.

Question 3

Q

What is the goal of R-CNN?

Answer

A

To detect and classify objects in an image by drawing bounding boxes and assigning labels.

Question 4

Q

What are the steps involved in R-CNN?

Answer

A

Generate region proposals (Selective Search)

Warp proposals

Extract features using CNN

Classify using SVM

Refine bounding boxes

Question 5

Q

What are the main drawbacks of R-CNN?

Answer

A

It’s computationally expensive due to 2000 forward passes per image, and Selective Search is fixed and non-learnable.

Question 6

Q

How does Fast R-CNN improve on R-CNN?

Answer

A

It runs the CNN once per image to create a feature map, then applies RoI pooling for region proposals, speeding up the process.

Question 7

Q

What is RoI Pooling?

Answer

A

A method to extract fixed-sized feature maps from arbitrary regions in the image.

Question 8

Q

What are the advantages and disadvantages of Fast R-CNN?

Answer

A

Pros: Faster than R-CNN by sharing computations.
Cons: Still relies on slow Selective Search for proposals.

Question 9

Q

What is the innovation in Faster R-CNN?

Answer

A

It introduces a Region Proposal Network (RPN) to generate region proposals directly from feature maps.

Question 10

Q

How does the RPN work?

Answer

A

It slides over the feature map, generating anchors of different scales and aspect ratios, and scores them based on overlap with ground truth.

Question 11

Q

What happens after the RPN stage?

Answer

A

RoI pooling extracts features for each proposal, and R-CNN classifies them into object classes.

Question 12

Q

What is U-Net?

Answer

A

A convolutional neural network architecture for semantic segmentation that combines downsampling and upsampling paths.

Question 13

Q

What are the benefits of U-Net?

Answer

A

It preserves location information and supports variable-sized inputs due to the absence of dense layers.

Question 14

Q

What is Mask R-CNN?

Answer

A

An extension of Faster R-CNN that adds a mask prediction branch for each RoI, enabling pixel-level segmentation of object instances.

Question 15

Q

What is the output of Mask R-CNN?

Answer

A

Bounding boxes, class labels, and binary masks for each object instance.