Lecture 12 - Detection Flashcards

1
Q

What is object detection?

A

Object detection is the process of identifying and locating objects within an image, typically providing a bounding box and a class label for each detected object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the sliding window approach in object detection.

A

The sliding window approach involves scanning the image with a fixed-size window at different scales and positions, applying a classifier to each window to detect objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Viola-Jones face detector?

A

The Viola-Jones face detector is a real-time object detection framework that uses integral images for fast feature evaluation, boosting for feature selection, and a cascade of classifiers for high detection rates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the generalized Hough transform.

A

The generalized Hough transform is a method for detecting arbitrary shapes by mapping edge points in the image to a parameter space and identifying peaks in the accumulator array that correspond to shape instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are region-based methods in deep learning for object detection?

A

Region-based methods, such as R-CNN, Fast R-CNN, and Faster R-CNN, involve generating region proposals, extracting features from these regions, and classifying them to detect objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the YOLO (You Only Look Once) detection method.

A

YOLO is a real-time object detection method that divides the input image into a grid, with each grid cell predicting bounding boxes, confidence scores, and class probabilities for objects whose centers fall within the cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Feature Pyramid Network (FPN)?

A

An FPN is a deep learning architecture that creates a feature pyramid from a single input image, enabling object detection at multiple scales by combining features from different layers of a convolutional network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe instance segmentation.

A

Instance segmentation is a task that combines object detection and semantic segmentation, providing a pixel-wise mask for each detected object, distinguishing between different instances of the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is pose estimation?

A

Pose estimation involves detecting and predicting the spatial configuration of an object’s key points, such as joints in a human body, often used for applications like action recognition and augmented reality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain 3D object detection.

A

3D object detection involves identifying and localizing objects in three-dimensional space, often providing 3D bounding boxes or poses, and is used in applications like autonomous driving and robotics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of non-maximum suppression in object detection?

A

Non-maximum suppression is used to remove redundant bounding boxes for the same object by selecting the box with the highest confidence score and discarding others that have a high overlap (IoU) with it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the concept of anchor boxes in object detection.

A

Anchor boxes are predefined bounding boxes of different scales and aspect ratios used in region proposal networks (RPN) to detect objects at multiple scales and locations within an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write the formula for the confidence score in YOLO.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Provide the formula for the weighted error in AdaBoost.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for updating weights in AdaBoost?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does the sliding window approach handle different object scales?

A

The sliding window approach handles different object scales by resizing the input image or the window itself, ensuring that objects of various sizes can be detected.

17
Q

What are integral images and how are they used in the Viola-Jones detector?

A

Integral images are data structures that allow for rapid computation of the sum of pixel values in rectangular regions. They are used in the Viola-Jones detector to quickly compute Haar-like features for object detection.

18
Q

Explain the role of boosting in the Viola-Jones face detector.

A

Boosting in the Viola-Jones face detector is used to select and combine weak classifiers (features) into a strong classifier, improving detection accuracy by focusing on the most informative features.

19
Q

How does the R-CNN method generate region proposals?

A

R-CNN generates region proposals using selective search, which combines hierarchical grouping of similar regions and exhaustive search methods to propose candidate bounding boxes for potential objects.

20
Q

What is the main advantage of Faster R-CNN over R-CNN and Fast R-CNN?

A

Faster R-CNN integrates region proposal generation into the network using a Region Proposal Network (RPN), significantly reducing computation time and enabling end-to-end training.

21
Q

Describe the YOLO detection pipeline.

A

The YOLO detection pipeline divides the input image into a grid, with each grid cell predicting bounding boxes, confidence scores, and class probabilities. Non-maximum suppression is then applied to remove redundant boxes.

22
Q

What is the role of the feature pyramid network (FPN) in object detection?

A

The FPN creates a pyramid of feature maps at different scales from a single input image, allowing the network to detect objects at multiple scales and improving detection accuracy for small objects.

23
Q

How does Mask R-CNN extend Faster R-CNN for instance segmentation?

A

Mask R-CNN extends Faster R-CNN by adding a branch that outputs a binary mask for each detected object, allowing for pixel-level segmentation in addition to bounding box detection.

24
Q

What are the main challenges in 3D object detection?

A

Main challenges in 3D object detection include handling occlusions, varying lighting conditions, and accurately estimating the object’s position, orientation, and scale in three-dimensional space.

25
Q

How does the generalized Hough transform handle arbitrary shapes?

A

The generalized Hough transform maps edge points in the image to a parameter space, where peaks in the accumulator array indicate the presence of shapes that match the model, allowing detection of arbitrary shapes.

26
Q

Explain the concept of region proposal networks (RPN) in Faster R-CNN.

A

RPNs generate region proposals by sliding a small network over the feature map, predicting bounding boxes and objectness scores at each location, integrating region proposal generation into the detection network.

27
Q

What is the purpose of non-maximum suppression in YOLO?

A

Non-maximum suppression in YOLO is used to eliminate redundant bounding boxes by selecting the box with the highest confidence score and discarding others with a high overlap (IoU), ensuring a single detection per object.