Geometry Flashcards by Patrick Henriksen

Why should we remove pincushion and barrel distortion before processing

The geometry used in this class assumes a pinhole camera model, this model has no distortion

How well did you know this?

Not at all

Perfectly

Briefly describe Zhangs method for camera calibration

Take at least 3 images of a known surface (checkerboard)
Use these images to calculate homographies.
Estimate K from these homographies
Use iterative, non-linear methods to calculate the full set of intrinsic parameters

How well did you know this?

Not at all

Perfectly

If we estimate the camera pose from the K matrix and corresponding world/image points mathematically, the rotation part will typically not be a true rotation matrix. How can we fix this?

We can use the SVD to get the closest true rotation matrix in Frobenius-norm.

How well did you know this?

Not at all

Perfectly

What is the difference between PnP methods and iterative methods for camera pose estimation from 3D data?

The PnP methods are non-iterative, faster and need fewer data points.

How well did you know this?

Not at all

Perfectly

What do we need to know to extract 3D information from single view cameras?

We need objects/ regions with a known 3D structure, like planar regions, parallel lines, horizontal surfaces, vertical structures.

How well did you know this?

Not at all

Perfectly

What is the vanishing point of a line

Infinite lines in the real world have finite length in the image and vanish at a point. This is the vanishing point.

How well did you know this?

Not at all

Perfectly

How is the vanishing point for parallel lines related

Parallel lines in the same plane have the same vanishing point

How well did you know this?

Not at all

Perfectly

What is the vanishing line, and how is this related to vanishing points

The vanishing line is where planes disappear in the image (Like the horizon). All vanishing points for lines in the plane intersect the vanishing line of that plane.

How well did you know this?

Not at all

Perfectly

If the vanishing line of the horizontal plane is straight and runs through the center of the image, how is the camera rotated?

The camera is straight and level.

How well did you know this?

Not at all

Perfectly

Which parameters determine the epipolar plane for two view geometry?

The position of the cameras and the observed points

How well did you know this?

Not at all

Perfectly

What are the epipoles?

The epipoles are where the baseline intersects the images

How well did you know this?

Not at all

Perfectly

What is the Q matrix used for in stereo geometry

The Q matrix is used to reproject points in the image to the world, it gives the world coordinates from image coordinates.

How well did you know this?

Not at all

Perfectly

What is the DSI (Disparity Space Image) used in Stereo processing

DSI is a mapping R3->R with pixel coordinates u,v and disparity d as input and output indicating how well it matches.

How well did you know this?

Not at all

Perfectly

What is E, the essential matrix, in two view geometry

The essential matrix relates a point in the first normalized image plane to an epipolar line in the second normalized image plane.

How well did you know this?

Not at all

Perfectly

How many point correspondences do we need to estimate E, the essential matrix

At least 5

How well did you know this?

Not at all

Perfectly

What is F, the fundamental matrix, in two view geometry

The fundamental matrix relates a point in one image to an (epipolar) line in the second image plane

How well did you know this?

Not at all

Perfectly

How many point correspondences do we need to estimate F, the fundamental matrix

At least 7, but 8 is often used.

How well did you know this?

Not at all

Perfectly

Describe the 8-point algorithm for computing F, the fundamental matrix.

Normalize using similarity transforms
Compute A from the point correspondences
Use SVD and extract F_mark from the right Singular value vector
Calculate the SVD of F_mark.
Set s_33 = 0 to enforce a true Fundamental matrix
Denormalize F

How well did you know this?

Not at all

Perfectly

What is the fundamental difference between the 8 and 7 point algorithm for computing F?

The 7 point algorithm will give a 2-D nullspace of Ah. F can be found through a linear combination of the basis vectors with the constraint det(F) = 0. This constraint gives rise to a cubic polynomial which gives 1 or 3 solutions for F.

How well did you know this?

Not at all

Perfectly

What is the theory behind two-view triangulation and why can’t we use this directly in practice.

Theoretically, two image points can be back-projected into the world and we can determine the intersection of the two lines formed. In practical applications, noise will usually result in these two lines not intersecting.

How well did you know this?

Not at all

Perfectly

What is the problem with minimizing geometric error in triangulation, and what linear alternatives do we have?

Minimizing the geometric error will usually not minimize the reprojection error and the method doesn’t naturally extend to more than two cameras.

We could instead minimize the algebraic error using a least squares approach

How well did you know this?

Not at all

Perfectly

How do we reduce the non-linear reprojection minimization problem of triangulation from 3 parameters of X, the point in the world, to 1 parameter?

We utilize that the epipolar lines can be described with 1 parameter, and minimize the error from u, u’ to their polar line.

How well did you know this?

Not at all

Perfectly

Can we determine the camera matrices P and P’ from the fundamental matrix F?

Study These Flashcards

Yes, but only up to projective ambiguity. By adding knowledge of the scene or restricting yourself to calibrated cameras this can be restricted to affine or metric ambiguity.

Can we recover the Pose of cameras from the essential matrix E?

Study These Flashcards

Yes, but only up to a scale for t, the translation. q

What is the cheirality constraint?

When recovering the camera pose from E, we get 4 solutions up to a scale of t, but only one of this solutions will put both cameras in front of the scene. This is the cheirality constraint.

Describe the algorithm for visual odometry

1. Capture img_k+1. 2. Use feature matching to estimate E between img_k+1 and img_k 3. Decompose E into R ant t. 4. calculate ||k_t_k+1|| from ||k-1_t_k|| and rescale t 5. Calculate the POSE of k+1 relative to 0.

How can we treat the scalability problem in visual odometry?

We can set 1_t_0 to 1 and calculate t_k+1 relative to t_k.

What is difference between visual odometry in a 3D scene and a planar scene

The 3D cases estimates and epipolar geometry and uses E to calculate pose. The planar scene case estimates a Homography, H, and uses this to estimate pose.

What is the trifocal tensor

The trifocal tensor is the algebraic representation of three view geometry

Describe the Sequential SfM (Sequential Structure from Motion)

1. Initialize motion from two images (as described in two view geometry) 2. Initialize 3D structure with triangulation 3. For each new view calculate the projection matrix, refine and extend the 3D structure

What is bundle adjustment in relation to SfM (Structure from Motion)

A non-linear method that refines structure and motion by minimizing the squared reprojection error

What can we do to deal with the potential extreme number of parameters in Bundle adjustment for SfM

- Compute bundle adjustment only on a subset and add missing views/points based on the result - Divide view/points into several subsets, perform bundle adjustment for each and merge them.

What are the advantages of multiple view depth calculations over stereo view?

1. Multiple views can be used to verify correspondences 2. Can make reconstruction more robust to occlusion 3. Can be used to infer free space and volumes.

Describe the plane sweep algorithm

1. Map each target image to the reference image using each plane depth 2. Compute the similarity for each pixel for each depth (Using Zero Mean Normalized Cross Correlation on a small patch around the pixel) 3. Chose the best fit depth for each pixel.

What can we do to avoid distortions and get better matching during plane-sweep

Choose another plane normal (ground normal...)

What is a voxel?

3D "Pixel"

Describe space carving

1. Create a surface ( Cube ) of voxels 2. Project voxel into image 3. Remove if not photo consistant 4. Continue until convergence

What is PnP and what does it do?

n-point pose problem | Estimates the camera pose (camera position in world frame).

What is the epipolar plane?

The plane defined by the two camera centers and a point X in 3D.

What is the baseline?

The line defined by the two camera centers.

What is an epipolar line?

The line defined by the epipolar plane intersecting the normalised image plane.

What is an epipole?

Where the baseline intersects the two image planes.

How can we create a sparse set of stereo matches?

Match the best feature points

How can we create a dense set of stereo matches?

Search along a subset of the epipolar line with some window. Find the closest match, do this for all pixels.

What error are we often trying to minimize with non-linear triangulation?

The reprojection error

What is the formula for calculating the Essential matrix E

E = [t]x * R, where t is the translation vector from camera 2 to 1, [t]x is the cross product matrix and R is the rotational matrix from camera 1 to 2.

How are F and E related

F = K^(-T) * E * K^-1

In two view geometry, how can we calculate the epipolar line in image 2 for a given point x in Image 1?

l = Fx

Explain briefly how we can estimate the relative pose between two perspective cameras from a pair of overlapping images.

1. Estimate E 2. Calculate R and t using the SVD 3. This will give 4 solutions, but only 1 solution where both scenes are in front of both cameras. (Cherelatiy constraint). The correct solution can be found by triangulating at least one point, but preferably several.

What do we mean by structure from motion

Structure from motion is an algorithm for reconstructing 3D structures and projection matrices from multiple views from fixed points.

Explain the main principles of how we can estimate dense depth maps from multiple images using plane sweep.

1. Map each image to a reference image for several different depths. 2. Compute the similarity between each image and the reference image using ZNCC(Zero mean Normalized Cross Correlation). 3. Combine the results from each view by summing the scores. 4. Chose the depth which fits best. (The dept plane normal doesn't have to fit with the reference image z. It can, for example, be Ground normal))

If an infinite line doesn't have a vanishing point, what do we know about the orientation of the line?

The lines are parallel to the image plane ( or equivalent perpendicular to the optical axis).

Assume that you have a straight and level camera. One vertical structure is exactly the same height as the vanishing line for the horizontal plane in the image plane. What is the real height of the vertical structure?

The same as the height of the camera.

Assume we have a straight and level camera. In the image plane, there are two vertical infinite lines without a vanishing point. We then tilt the camera upwards. What happens to the lines regarding vanishing points, and what happens to the horizontal vanishing line?

The lines will get a vanishing point above the camera. The horizontal vanishing line will shift downwards.

Name at least one bundle adjustment library

GTSAM, RealityCapture, SBA, Ceres, Bundler...

Describe how to estimate camera pose when we have a single view of a planar world scene, and known H and K.

1. u = K[r1, r2, r3, t]x 2. H = K[r1, r2, t] 3. [r1, r2, t] = (to scale) K^-1 * H = M 4. [r1, r2, t] = +/- lambda * M 5. r3 = +/-(r1 x r2) 6. r3 sign is choosen so that det(R) = 1 7. Choose the solution where camera is over the scene ,

Geometry Flashcards

(56 cards)