Object detection Flashcards
Whats the problem with moving a sliding window over an entire image and use a classification network for object detection?
To computational expensiv to use a classifier on each pixel (and scale…)
What is selective search?
We over-segment an image , and than combine regions with similar features to create region proposals
What is the R_CNN method?
Uses selective search for region proposal, SVM for classification and linear regression for localization. Both use CNN features.
What is the fast r_CNN method
Uses selective search for proposal, CNN for localization and classification
What is a RoI pooling layer?
Converts convo feature maps into a fixed size. Used because region proposals can be of arbitary size.
What loss does the fast R_CNN method use?
It uses cross entropy for classification and L1 or L2 for localization.
How does the fast R_CNN predict location?
The location is predicted as offsets from the original region proposal
What is the smooth L1 loss
It is quadratic (0.5x^2) for small values and linear (abs(x) - 0.5) for values above 1.
How are the training batches for fast R_CNN defined?
2 images, 64 region proposals pr. image, 25% foreground, 75% background.
What is the main difference between fast R_CNN and faster R_CNN?
Faster R_CNN uses a a proposal CNN. The proposal and and detection networks share feature maps.
How does the Region Proposal Network work in faster R_CNN?
It uses a 3x3 sliding window On the convo feature map and proposes a number of bounding boxes with different scales, aspects and anchors.
What is Mask R_CNN
It performs segmantion on the faster_RCN objects.