Stereo Basics and Epipolar Geometry - Week 7/8 Flashcards
What is the goal of stereo / epipolar geometry?
Recovery of a 3D structure
What is the problem with single-view geometry for stereo imaging?
Recovery of structure from one image is inherently ambiguous
What visual cues can lead to retrieving 3D geometry from 2D image?
Shading
Texture
Focus
Perspective
Motion
What is stereo vision?
Given several images of the same object or scene, compute a representation of its 3D shape
Narrower definition:
Given a calibrated binocular stereo pair, fuse it to produce a depth image
What is triangulation?
Gives a reconstruction in 3D space as an intersection of two rays
Requires:
- Camera pose (calibration)
- Point correspondence
What is the focal length for a pinhole camera?
The distance between the Image Plane and the Center of projection
What two equations translate images based on their X, Y, Z from the centre of projection to points x, y on the image plane. Based on the focal length
x = f/Z * x
y = f/Z * y
What is the baseline in a stereo system?
The distance between the centre of projections of the two images
What is the definition of disparity?
Displacement between conjugate (corresponding) points in left and right images
What is the formula to calculate the Z depth of the image at a point in two stereo images given baseline b, focal length f and the disparity between the two points (xl - xr)?
Z = b*f/(xl - xr)
What are the components of stereo analysis?
Find correspondences
- Conjugate pairs of points
- Potentially hard - lots of pairs
Reconstruction
- Calculate scene coordinates (X, Y, Z)
- Easy once you have done…
Calibration
- Calculate parameters of cameras (b, f, …)
What is the epipolar constraint?
The match for a given (xl, yl) lies on a given yr = yl
(For the simple system given in the lectures)
What makes edges good places to match for correspondances?
- The correspond to significant structure
- Small number of points to match (aren’t usually too many of them - combinatorics)
- Can use image features (polarity, direction) to verify matches
- They can be located accurately (Canny - sub-pixel localisation)
- Multi-scale location (coarse to fine search
Problems with matching edges for correspondance?
Image gradients at corresponding points may not be equally high
- Shadows, occlusions, illumination differences
Horizontal edges are difficult to match
- Match points are poorly localised along epipolar lines
- Not all significant structure lies on the edge
- Near magnitude features may not be reliable for matching
- Near-horizontal edges do not provide good localisation
What are interest operators?
Locally distinct points
Edge matches could be obtained at neighbouring points along an edge
“Interest” operators seek isolated discrete points
Moravec operator
DoG, LoG
Harris corner detection
What is the moravec operator?
Calculate sum(I(i,j) - I(i+1, j)), sum(I(i,j) - I(i-1, j+1)), sum(I(i,j) - I(i, j+1)), sum(I(i,j) - I(i+1, j+1)) for a region (e.g. 5x5 pixels)
Output the minimum of the 4 values above.
suppress non-maxima of the filter output
- Isolate local maxima to get distinct points
Find points where intensity is varying quickly
- Taking minimum eliminates edges as candidates
Do the two cameras for stereo imaging need to have parallel opitcal axis’?
No
Why is the epipolar constraint useful?
It constrains finding a points correspondence to a 1D search problem along conjugate epipolar lines
What is the baseline?
Line joining the camera centres.
What is the epipole?
Point of intersection of the baseline with the image plane
What is the epipolar plane?
Plane containing the baseline and the world point
What is the epipolar line?
Intersection of the epipolar plane with the image plane
What is the significance of the epipolar line of a point P?
Potential matches for the point p for the epipolar line in Image 1 have to lie on the corresponding epipolar line in Image 2
How are coordinate systems related?
Rotation matrices Rl and Rr giving the orientations of each of the camera coordinate systems relative to the scene coordinate system
Translation vectors Tl and Tr between the camera origins and the scene origins
Why do the projected vectors in stereo reconstruction normally not collide?
Because of measurement inaccuracies
Need to find the mid-point of the vector between the closest points on each
What does it mean for a camera rig to be calibrated?
It means we know how to translate and rotate camera reference frame 1 to get camera reference frame 2
How is the rotation mathematically represented in stereo geometry?
A 3x3 matrix
What does the cross product do?
Takes two vectors and returns a third vector that’s perpendicular to both inputs
What is the essential matrix?
Relates corresponding image points between both cameras, given the rotation and translation.
If we observe a point in one image, its position in other image is constrained to lie on the line defined by the essential matrix
What is rectifying?
Transforming (warping) two images so that their image planes are parallel
- Epipoles should be at infinity
- Epipolar lines should be horizontal
How is rectification achieved?
By re-projecting image planes onto a common plane parallel to the line between optical centres
What are the 4 main steps to stereo reconstruction?
- Calibrate Cameras
- Rectify Images
- Compute Disparity
- Estimate Depth
What are the hard constraints of epipolar geometry?
That the corresponding point to p in image 1 must lie on the corresponding epipolar line in image 2
What are the soft constraints of epipolar geometry?
Parts of features that indicate they are similar:
- Similarity
- Uniqueness
- Ordering
What are the assumptions made to find matches in the image pair?
Most scene points visible from both views
Image regions for the matches are similar in appearance
How do dense correspondance searches work?
For each pixel in the first image:
- Find corresponding epipolar line in the right image
- Examine all pixels on the epipolar line and pick the best match (e.g. SSD, correlation)
- Triangulate the matches to get depth information
Last stage is easiest when epipolar lines are scanlines -> rectify the images first
What is the effect of the window size in a window search (example of dense correspondence search)?
Want a window large enough to have sufficient intensity variation, yet small enough to contain only pixels with about the same disparity
What is sparse correspondence search?
Restrict search to sparse set of dedicated features
Rather than pixel values (or lists of pixel values) use feature descriptor and an associated feature distance.
Can still narrow search further using epipolar geometry
Dense vs Sparse correspondence search comparison?
Sparse
- Efficiency
- Can have more reliable feature matches, less sensitive to illumination than raw pixels.
- Have to know enough to pick good features
- Sparse information
Dense
- Simple process
- More depth estimates, can be useful for surface reconstruction
- Breaks down in textureless regions anyway
- Raw pixel distances can be brittle
- Not good with very different viewpoints
What are the difficulties caused by the similarity constraint?
Un-textured surfaces
- Can’t differentiate all white pixels from all other all white pixels
Occlusions
- Feature may not be visible from both images
What is the ordering soft constraint?
Points on the same surface (opaque object) will be in same order in both views
What are the possible sources of error for stereo?
Low-contrast / textureless image regions
Occlusions
Camera calibration errors
Violations of brightness constancy (e.g. specular reflections)
Large motions
What are some applications of stereo imaging?
Depth for segmentation
(Could find edges in disparity map with image edges enhances contours found)
View interpolation (From Brave Search: View interpolation is the process of creating a sequence of synthetic images that represent a smooth transition from one view of a scene to another)
Virtual viewpoint video (From Brave search: Virtual Viewpoint Video (FVV) is a form of user-centered virtual reality that allows viewers to freely select the viewing position and angle)
What are the parameters of a camera?
Extrinsic parameters:
- Rotation matrix R (3X3) (3 free parameters)
- Translation Vector (Tx, Ty, Tz)
Intrinsic parameters:
- Relate pixel coordinates to image coordinates
- Pixel size (sx, sy): pixels may not be square
- Origin offset (dx, dy): pixel origin may not be on optic axis
- Focal length, f
- Not totally independent (Need dx, dy, f, sx/sy)
What is stereo calibration?
Need to know these camera parameters:
- R, T and f to calculate triangulation
- dx, dy, f, sx/sy to calculate image coordinates from pixel coordinates
Can calculate the second parameters if we know the scene coordinates of sufficient image points
Calibration using target image:
- Accurately measured feature positions
- Reliable location on images
What is a target image for callibration?
An image used to provide sufficient scene coordinates of image points to calculate camera parameters
What are the compromises for calibration algorithms?
- Accuracy of parameter estimation
- Robustness of parameter estimation
- Complexity of calculation
- Least squares … non-linear optimisations
- Engineering requirement of target
- Points on a plane
- Points throughout 3D volume