03-05 - VSLAM, BA, SfM Flashcards

Question 1

Q

What is the general bundle block adjustment problem & how are BA problems usually solved?

Answer

A

Non-linear error-minimization problem
x + correction = lambdaPX
we try to minimize the reprojection error

Numerically we end up saying:
observations + corrections = A * unknowns
where the unknowns can be split up into the 3d points and the 6d cam parameters and a is split up into C (obsxpoints) and B(obsximgs)

Question 2

Q

What is a gauge-freedom, where does it usually appear in BA problems and how can it be fixed?

Answer

A

It means we have multiple solutions
It usually appears if we do not have any known control points
To fix it: Add constraints and priors, controll points, loop closure

Question 3

Q

How do gross errors affect BA problems?

Answer

A

Since we do take the sum of squares, a single outlier can fuck up the whole result. It is very important to remove outliers before
Outliers can f.ex. be caused be wrong feature matching.

Question 4

Q

When and why do we need a sparse solver for BA?

Answer

A

If we have large sets of data to save computation time. (So mostly when using global BA)

Question 5

Q

How and why do we derive Jacobians for BA problems?

Answer

A

It represents the partial derivatives of the reprojection error with respect to the parameters being optimized.

The Jacobian matrix provides information about how changes in the parameters affect the reprojection error, which is crucial for finding the optimal parameter values that minimize the error.

Can be found analytically and numerically.

Question 6

Q

What defines visual odometry?

Answer

A

Motion Estimation with the help visual input
It may use local optimization but not global.

Question 7

Q

Why do VO solutions tend to drift?

Answer

A

The error is accumulating over time

Question 8

Q

Which kinds of correspondences can be used in VO?

Answer

A

2D-2D - reprojection error
3D-2D - reprojection error
3D-3D - 3D point difference
3d-3d has the disatvantage of the 3D point calculation being uncertain, on the other hand stereo has the advantage before monocular, that the scalefactor is not unknown and there is no scaling drift. Local ba should be used no matter which method is chosen.

Question 9

Q

How is the essential matrix estimated from consecutive frames?

Answer

A

5 Point Method
8 Point Method: Longuet-Higgins, all 8 pointpairs are put in a vector, E is vecotrized, solve: p_2^T * E * p_1 = 0
now we have a Ax = 0 problem where we want to find x (which is E)
We are not interested in the trivial solution where E is only zeros, therefore we do not use fx gaussian elemination, but SVD to solve

Question 10

Q

How is relative motion computed from the essential matrix?

Question 11

Q

When should robust estimation (e.g. RANSAC) be used in VO?

Answer

A

Outlier removal to prepare for BA
Causes for outliers can be image noise, occlusion, blur and changes in viewpoint/illumination that the mathematical model of feature descriptors does not account for.

Question 12

Q

What is loop detection and closure?

Answer

A

After some time detecting same features again, and making sure in the map that the loop is closed.

Question 13

Q

Explain the Graph-SLAM approach.

Answer

A

The graph represends the problem, every node a pose of the robot during mapping (the states). The edges correspond to spatial constraints between the poses (relative transforms, but very uncertain). So an edge between two nodes correspond to the odometry measurement. It exists if the robot either moves from the one pose to the other or if the robot observes the same part of the environment from both poses

Even though we see the same thing, we are not in the same position with the camera yet, so we need to find that last transform to be able to close the loop:

$X_i^{-1}X_j$ , where $X_i$ is the transformation from origin to $x_i$ and $X_i^{-1}$ is the inverse transformation.

Question 14

Q

Appearance based SLAM vs feature based SLAM

Answer

A

Appearance based:
- uses intesity information of all pixels
- computationally heavy less accurate
- Global

Feature based
- uses only salient and repeatabæe features across images
- fast, accurate, requires ability to match accross frames
- local

Question 15

Q

What is the purpose of front-end and back-end in SLAM?

Answer

A

Making the system applicable in real-time (VO in front end, BA in backend)

Question 16

Q

How can we reconstruct 3d geometry from uncalibrated cameras?

Answer

A

Structure for motion

Question 17

Q

SfM ambiguities?

Answer

A

Projective: perserves intersections & tangency
Affine: perserves parallelism and volume ratios
Similarity: perserves angles and length ratios
Eucledian: perserves lengths

Question 18

Q

How can we use orthographic projection approximations in SfM?

Answer

A

if we are close enough, computation is cheaper bc we only have 12dof (affine) instead of 15dof(projective).

Question 19

Q

Affine SfM pipline?
How is a zero-skew constraint introduced in SfM reconstruction?

Answer

A

We assume no vanishing points and orthographic projection
Now given m images with n fixed 3D points we need to use the mn correspondences x to estimate the m projection matrices A, m translation vectors b and n 3D points X

1: simplify by centering (removing b)
2: Construct measurement matrix D (x = AX for each point, stack to make a 2m (cameras) x n (points) matrix
measurement = motion x shape
3: Factorize D to get Motion and Shape matrix: Do SVD, keep only most important info (think principal componant analysis)

Problem: solution is not unique. We can eliminate the affine ambiguities by expecting the image axes to be perpendicular and of unit length to find

Question 20

Q

How do we deal with missing data?

Answer

A

One approach:

Find a dense subblock of the measurement matrix
Do reconstruction
Add data

Question 21

Q

2D-2D pipeline

Answer

A

capture frame, extraxt and match features
Find the essential matrix, eg with the Longuet-Higgins 8p algorithm
factorize E via svd to get R and t
find the correct solution out of the four by checking for which one the z coordinate is positive)
compute the relative scale (use the absolut distance between the 3D points, there will always be scale drift)

Question 22

Q

3d-3d

Answer

A

Needs stereo vision (to triangulate 3D points)
min 3 non collinear correspondences
Find the transformation that minimizes the sum of 3D distances, which we use kabsch for

Question 23

Q

3d-2d

Answer

A

minimizes reprojection error
works for stereo and monocilar cases
Depending on which PnP (perspective n point) algorithm is used the requirements are different (so how many point pairs are needed, but 3 typically is the minimum)