Detection Flashcards
What are 2 region proposal methods approaches in context of object detection. Explain both
Selective Search: Selective Search generates potential object regions by iteratively grouping pixels based on color, texture, and similarity, producing a small set of region proposals.
R-CNN: A deep learning-based method that uses a Region Proposal Network (RPN) to propose object regions and predict objectness scores, integrated into the object detection pipeline for end-to-end training.
Region Proposal Networks (RNP) are an essential part of detection algorithms such as Faster R-CNN and Mask R-CNN. Explain the RPN architecture and outline how those are trained
RPN is a fully convolutional Network. The Input is the convolutional feature map from a bigger Neural Network, and it passes through the first convolutional 3x3 layer, that uses the slide window method, the next one is a classifier layer that predict the objectness of the window (probability of the object being in that region). The last layer is a regression layer, that predicts the coordinates of the bounding boxes.
RPN is trained using end-to-end approach, using a combination of classification ans regression loss
Apart from supervised proposal mechanisms like RPN there also exist unsupervised methods. Name two unsupervised RP mechnisms and briefly explain the idea behind them
Sliding Window: a small window is moved across the image and each position is evaluated for objectness.
Selective Search: Segments the image into multiple regions and combined them based on their similarity to generate object proposals
Many recent detection algorithms (Yolo-v3, SSD, RetinaNet) do not utilize region proposals. How are these methods called and what basic idea is used to make bounding box predictions?
Single-shot detection. The basic idea is to use convolutional layers to at the same time predict object classes and bounding box offsets for the anchors, without the need of a separate region proposal step.
The region proposal network is used to extract object proposals in the faster R-CNN architecture. Given that the number of predefined anchors is k and the input feature size map is CxWxH (C:Channels, W: width, H: height), how many proposals can be obtained?
Number of Proposals = k * W * H
Explain the improvements of Faster R-CNN with regard to R-CNN and Fast R-CNN
Faster R-CNN used the RPN instead of Selective Search, enabling end-to-end learning and making the process faster and more efficient
Name 2 differences in the objection detection pipeline of R-CNN and SSD
Region Proposal:
R-CNN uses selective search for region proposals. SSD generates fixed anchor boxes of various aspect ratios and scales across the entire image.
Training Approach:
R-CNN trains each stage (region proposals, CNN, and classifier) separately.
SSD trains the entire network as a single-shot detector, optimizing for both object class scores and bounding box regressions simultaneously.
Write formula precision, recall and F1 + Application
P = TP/(TP+FP)
R = TP / (TP + FN)
F1 = 2PR/(P+R).
The application is for classification tasks in computer vision, specially binary classification.