Robot Learning Flashcards

1
Q

What is epipolar geometry?

A

Epipolar geometry is the intrinsic projective geometry between two views. It is
independent of scene structure, and only depends on the cameras’ internal parameters
and relative pose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the fundamental matrix in relation to the geometry of two views?

A

The fundamental matrix F encapsulates the intrinsic geometry of the views, and is a 3x3 matrix with rank 2. If a point in 3-space X is imaged as x in the first view, and x′ in the second, then the image points satisfy the relation x′(transposed) Fx = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the relation between the corresponding image points x and x′?

A

the image points x and x′, space point X, and camera centres are coplanar. Denote this plane as π. Clearly, the rays back-projected from x and x′ intersect at X, and the rays are coplanar, lying in π.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how is the corresponding point x′ constrained?

A

The plane π is determined by the baseline and the ray defined by x. We know that the ray corresponding to the (unknown) point x′ lies in π, hence the point x′ lies on the line of intersection l’ of π with the second image plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the epipolar line corresponding to x?

A

the point x′ lies on the line of intersection l′ of π with the second image plane. This line l′ is the image in the second view of the ray back-projected from x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the benefit of utilizing epipolar geometry for a stereo correspondence algorithm?

A

The search for a particular point corresponding to x is constrained to the epipolar line l’, instead of searching the entire image plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the epipole?

A

The point of intersection of the line joining the camera centres (the baseline) with the image plane. Equivalently, the epipole is the image in one view of the camera centre of the other view. It is also the vanishing point of the baseline
(translation) direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the epipolar plane?

A

A plane containing the baseline (the line joining the camera centers) - represented by a single parameter family of epipolar planes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the epipolar line?

A

The intersection between the epipolar plane with the image plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are deformable parts models(DPM)?

A

use a sliding window approach where a
classifier is run at evenly spaced locations over an entire image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are region proposal methods?

A

methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. R-CNNs are an example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s the key weakness of region proposal methods?

A

Pipelines can be too slow and complex to optimize since each component of the pipeline needs to be trained separately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What architecture does the YOLO (You only look once by Redmon, Divvala, Girshick, Farhadi) paper present?

A

Proposes a unified architecture that has a single CNN that predicts both the bounding boxes and the associated probabilities. YOLO trains on full images and directly optimizes detection performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the main strengths proposed by the YOLO paper?

A

Architecture is fast, learns generalizable representations of objects, and reasons globally about images instead of using methods such as region proposal methods and DPMs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main weaknesses with the YOLO architecture?

A

YOLO lags behind in accuracy compared to SOTA models and struggles with localizing particularly smaller objects in images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe YOLO’s architectural design

A

YOLO divides images into SxS grid cells, and then tries to predict B bounding boxes per cell and the associated confidences of each box. If the center of an object is in the center of the cell - it is responsible for detecting it. Each bounding box consists of 5 predictions: x, y, w, h, and confidence. Each grid cell also predicts C conditional class probabilities, Pr(Classi|Object)

17
Q

What does confidence mean for the YOLO architecture

A

confidence scores reflect how
confident the model is that the box contains an object and
also how accurate it thinks the box is that it predicts. Formally defined confidence is Pr(Object) ∗ IOU^truth/pred (intersection over union between prediction and ground truth)

18
Q

Instance Segmentation

A

Combines object detection and semantic segmentation to correctly detect all objects in an image while precisely segmenting each instance.

19
Q

Object Detection

A

Classifying and localizing individual objects using a bounding box.

20
Q

Semantic Segmentation

A

Computer vision task of classifying each pixel of an image into a fixed set of categories without differentiating object instances.

21
Q

How to add context when using a sliding window approach for semantic segmentation?

A

Extract regional patches around center pixels, and then feed these patches into CNNs to identify the center pixel. However, this is very costly computationally and very inefficient.