Robot Learning Flashcards

Question 1

Q

What is epipolar geometry?

Answer

A

Epipolar geometry is the intrinsic projective geometry between two views. It is
independent of scene structure, and only depends on the cameras’ internal parameters
and relative pose.

Question 2

Q

What is the fundamental matrix in relation to the geometry of two views?

Answer

A

The fundamental matrix F encapsulates the intrinsic geometry of the views, and is a 3x3 matrix with rank 2. If a point in 3-space X is imaged as x in the first view, and x′ in the second, then the image points satisfy the relation x′(transposed) Fx = 0.

Question 3

Q

What is the relation between the corresponding image points x and x′?

Answer

A

the image points x and x′, space point X, and camera centres are coplanar. Denote this plane as π. Clearly, the rays back-projected from x and x′ intersect at X, and the rays are coplanar, lying in π.

Question 4

Q

how is the corresponding point x′ constrained?

Answer

A

The plane π is determined by the baseline and the ray defined by x. We know that the ray corresponding to the (unknown) point x′ lies in π, hence the point x′ lies on the line of intersection l’ of π with the second image plane.

Question 5

Q

What is the epipolar line corresponding to x?

Answer

A

the point x′ lies on the line of intersection l′ of π with the second image plane. This line l′ is the image in the second view of the ray back-projected from x

Question 6

Q

What is the benefit of utilizing epipolar geometry for a stereo correspondence algorithm?

Answer

A

The search for a particular point corresponding to x is constrained to the epipolar line l’, instead of searching the entire image plane.

Question 7

Q

What is the epipole?

Answer

A

The point of intersection of the line joining the camera centres (the baseline) with the image plane. Equivalently, the epipole is the image in one view of the camera centre of the other view. It is also the vanishing point of the baseline
(translation) direction.

Question 8

Q

What is the epipolar plane?

Answer

A

A plane containing the baseline (the line joining the camera centers) - represented by a single parameter family of epipolar planes.

Question 9

Q

What is the epipolar line?

Answer

A

The intersection between the epipolar plane with the image plane.

Question 10

Q

What are deformable parts models(DPM)?

Answer

A

use a sliding window approach where a
classifier is run at evenly spaced locations over an entire image

Question 11

Q

What are region proposal methods?

Answer

A

methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. R-CNNs are an example.

Question 12

Q

What’s the key weakness of region proposal methods?

Answer

A

Pipelines can be too slow and complex to optimize since each component of the pipeline needs to be trained separately.

Question 13

Q

What architecture does the YOLO (You only look once by Redmon, Divvala, Girshick, Farhadi) paper present?

Answer

A

Proposes a unified architecture that has a single CNN that predicts both the bounding boxes and the associated probabilities. YOLO trains on full images and directly optimizes detection performance

Question 14

Q

What are the main strengths proposed by the YOLO paper?

Answer

A

Architecture is fast, learns generalizable representations of objects, and reasons globally about images instead of using methods such as region proposal methods and DPMs

Question 15

Q

What are the main weaknesses with the YOLO architecture?

Answer

A

YOLO lags behind in accuracy compared to SOTA models and struggles with localizing particularly smaller objects in images.

Question 16

Q

Describe YOLO’s architectural design

Answer

Study These Flashcards

A

YOLO divides images into SxS grid cells, and then tries to predict B bounding boxes per cell and the associated confidences of each box. If the center of an object is in the center of the cell - it is responsible for detecting it. Each bounding box consists of 5 predictions: x, y, w, h, and confidence. Each grid cell also predicts C conditional class probabilities, Pr(Classi|Object)

Question 17

Q

What does confidence mean for the YOLO architecture

Answer

Study These Flashcards

A

confidence scores reflect how
confident the model is that the box contains an object and
also how accurate it thinks the box is that it predicts. Formally defined confidence is Pr(Object) ∗ IOU^truth/pred (intersection over union between prediction and ground truth)

Question 18

Q

Instance Segmentation

Answer

Study These Flashcards

A

Combines object detection and semantic segmentation to correctly detect all objects in an image while precisely segmenting each instance.

Question 19

Q

Object Detection

Answer

Study These Flashcards

A

Classifying and localizing individual objects using a bounding box.

Question 20

Q

Semantic Segmentation

Answer

Study These Flashcards

A

Computer vision task of classifying each pixel of an image into a fixed set of categories without differentiating object instances.

Question 21

Q

How to add context when using a sliding window approach for semantic segmentation?

Answer

Study These Flashcards

A

Extract regional patches around center pixels, and then feed these patches into CNNs to identify the center pixel. However, this is very costly computationally and very inefficient.

Robot Learning Flashcards

(21 cards)