Geometry Flashcards
Why should we remove pincushion and barrel distortion before processing
The geometry used in this class assumes a pinhole camera model, this model has no distortion
Briefly describe Zhangs method for camera calibration
- Take at least 3 images of a known surface (checkerboard)
- Use these images to calculate homographies.
- Estimate K from these homographies
- Use iterative, non-linear methods to calculate the full set of intrinsic parameters
If we estimate the camera pose from the K matrix and corresponding world/image points mathematically, the rotation part will typically not be a true rotation matrix. How can we fix this?
We can use the SVD to get the closest true rotation matrix in Frobenius-norm.
What is the difference between PnP methods and iterative methods for camera pose estimation from 3D data?
The PnP methods are non-iterative, faster and need fewer data points.
What do we need to know to extract 3D information from single view cameras?
We need objects/ regions with a known 3D structure, like planar regions, parallel lines, horizontal surfaces, vertical structures.
What is the vanishing point of a line
Infinite lines in the real world have finite length in the image and vanish at a point. This is the vanishing point.
How is the vanishing point for parallel lines related
Parallel lines in the same plane have the same vanishing point
What is the vanishing line, and how is this related to vanishing points
The vanishing line is where planes disappear in the image (Like the horizon). All vanishing points for lines in the plane intersect the vanishing line of that plane.
If the vanishing line of the horizontal plane is straight and runs through the center of the image, how is the camera rotated?
The camera is straight and level.
Which parameters determine the epipolar plane for two view geometry?
The position of the cameras and the observed points
What are the epipoles?
The epipoles are where the baseline intersects the images
What is the Q matrix used for in stereo geometry
The Q matrix is used to reproject points in the image to the world, it gives the world coordinates from image coordinates.
What is the DSI (Disparity Space Image) used in Stereo processing
DSI is a mapping R3->R with pixel coordinates u,v and disparity d as input and output indicating how well it matches.
What is E, the essential matrix, in two view geometry
The essential matrix relates a point in the first normalized image plane to an epipolar line in the second normalized image plane.
How many point correspondences do we need to estimate E, the essential matrix
At least 5
What is F, the fundamental matrix, in two view geometry
The fundamental matrix relates a point in one image to an (epipolar) line in the second image plane
How many point correspondences do we need to estimate F, the fundamental matrix
At least 7, but 8 is often used.
Describe the 8-point algorithm for computing F, the fundamental matrix.
- Normalize using similarity transforms
- Compute A from the point correspondences
- Use SVD and extract F_mark from the right Singular value vector
- Calculate the SVD of F_mark.
- Set s_33 = 0 to enforce a true Fundamental matrix
- Denormalize F
What is the fundamental difference between the 8 and 7 point algorithm for computing F?
The 7 point algorithm will give a 2-D nullspace of Ah. F can be found through a linear combination of the basis vectors with the constraint det(F) = 0. This constraint gives rise to a cubic polynomial which gives 1 or 3 solutions for F.
What is the theory behind two-view triangulation and why can’t we use this directly in practice.
Theoretically, two image points can be back-projected into the world and we can determine the intersection of the two lines formed. In practical applications, noise will usually result in these two lines not intersecting.
What is the problem with minimizing geometric error in triangulation, and what linear alternatives do we have?
Minimizing the geometric error will usually not minimize the reprojection error and the method doesn’t naturally extend to more than two cameras.
We could instead minimize the algebraic error using a least squares approach
How do we reduce the non-linear reprojection minimization problem of triangulation from 3 parameters of X, the point in the world, to 1 parameter?
We utilize that the epipolar lines can be described with 1 parameter, and minimize the error from u, u’ to their polar line.
Can we determine the camera matrices P and P’ from the fundamental matrix F?
Yes, but only up to projective ambiguity. By adding knowledge of the scene or restricting yourself to calibrated cameras this can be restricted to affine or metric ambiguity.
Can we recover the Pose of cameras from the essential matrix E?
Yes, but only up to a scale for t, the translation. q
What is the cheirality constraint?
When recovering the camera pose from E, we get 4 solutions up to a scale of t, but only one of this solutions will put both cameras in front of the scene. This is the cheirality constraint.
Describe the algorithm for visual odometry
- Capture img_k+1.
- Use feature matching to estimate E between img_k+1 and img_k
- Decompose E into R ant t.
- calculate ||k_t_k+1|| from ||k-1_t_k|| and rescale t
- Calculate the POSE of k+1 relative to 0.
How can we treat the scalability problem in visual odometry?
We can set 1_t_0 to 1 and calculate t_k+1 relative to t_k.
What is difference between visual odometry in a 3D scene and a planar scene
The 3D cases estimates and epipolar geometry and uses E to calculate pose. The planar scene case estimates a Homography, H, and uses this to estimate pose.
What is the trifocal tensor
The trifocal tensor is the algebraic representation of three view geometry
Describe the Sequential SfM (Sequential Structure from Motion)
- Initialize motion from two images (as described in two view geometry)
- Initialize 3D structure with triangulation
- For each new view calculate the projection matrix, refine and extend the 3D structure
What is bundle adjustment in relation to SfM (Structure from Motion)
A non-linear method that refines structure and motion by minimizing the squared reprojection error
What can we do to deal with the potential extreme number of parameters in Bundle adjustment for SfM
- Compute bundle adjustment only on a subset and add missing views/points based on the result
- Divide view/points into several subsets, perform bundle adjustment for each and merge them.
What are the advantages of multiple view depth calculations over stereo view?
- Multiple views can be used to verify correspondences
- Can make reconstruction more robust to occlusion
- Can be used to infer free space and volumes.
Describe the plane sweep algorithm
- Map each target image to the reference image using each plane depth
- Compute the similarity for each pixel for each depth (Using Zero Mean Normalized Cross Correlation on a small patch around the pixel)
- Chose the best fit depth for each pixel.
What can we do to avoid distortions and get better matching during plane-sweep
Choose another plane normal (ground normal…)
What is a voxel?
3D “Pixel”
Describe space carving
- Create a surface ( Cube ) of voxels
- Project voxel into image
- Remove if not photo consistant
- Continue until convergence
What is PnP and what does it do?
n-point pose problem
Estimates the camera pose (camera position in world frame).
What is the epipolar plane?
The plane defined by the two camera centers and a point X in 3D.
What is the baseline?
The line defined by the two camera centers.
What is an epipolar line?
The line defined by the epipolar plane intersecting the normalised image plane.
What is an epipole?
Where the baseline intersects the two image planes.
How can we create a sparse set of stereo matches?
Match the best feature points
How can we create a dense set of stereo matches?
Search along a subset of the epipolar line with some window. Find the closest match, do this for all pixels.
What error are we often trying to minimize with non-linear triangulation?
The reprojection error
What is the formula for calculating the Essential matrix E
E = [t]x * R, where t is the translation vector from camera 2 to 1, [t]x is the cross product matrix and R is the rotational matrix from camera 1 to 2.
How are F and E related
F = K^(-T) * E * K^-1
In two view geometry, how can we calculate the epipolar line in image 2 for a given point x in Image 1?
l = Fx
Explain briefly how we can estimate the relative pose between two perspective cameras from a pair of overlapping images.
- Estimate E
- Calculate R and t using the SVD
- This will give 4 solutions, but only 1 solution where both scenes are in front of both cameras. (Cherelatiy constraint). The correct solution can be found by triangulating at least one point, but preferably several.
What do we mean by structure from motion
Structure from motion is an algorithm for reconstructing 3D structures and projection matrices from multiple views from fixed points.
Explain the main principles of how we can estimate dense depth maps from multiple images using plane sweep.
- Map each image to a reference image for several different depths.
- Compute the similarity between each image and the reference image using ZNCC(Zero mean Normalized Cross Correlation).
- Combine the results from each view by summing the scores.
- Chose the depth which fits best.
(The dept plane normal doesn’t have to fit with the reference image z. It can, for example, be Ground normal))
If an infinite line doesn’t have a vanishing point, what do we know about the orientation of the line?
The lines are parallel to the image plane ( or equivalent perpendicular to the optical axis).
Assume that you have a straight and level camera. One vertical structure is exactly the same height as the vanishing line for the horizontal plane in the image plane. What is the real height of the vertical structure?
The same as the height of the camera.
Assume we have a straight and level camera. In the image plane, there are two vertical infinite lines without a vanishing point. We then tilt the camera upwards. What happens to the lines regarding vanishing points, and what happens to the horizontal vanishing line?
The lines will get a vanishing point above the camera. The horizontal vanishing line will shift downwards.
Name at least one bundle adjustment library
GTSAM, RealityCapture, SBA, Ceres, Bundler…
Describe how to estimate camera pose when we have a single view of a planar world scene, and known H and K.
- u = K[r1, r2, r3, t]x
- H = K[r1, r2, t]
- [r1, r2, t]= (to scale) K^-1 * H = M
- [r1, r2, t] = +/- lambda * M
- r3 = +/-(r1 x r2)
- r3 sign is choosen so that det(R) = 1
- Choose the solution where camera is over the scene ,