Week 7 and 8 - Stereo and Epipolar geometry Flashcards
What is the goal of stereo geometry
Recovering a 3D structure from a 2D image
Structure and depth are inherently ambiguous from single views
What visual cues does an image give us for 3D recovery
shading
texture
focus
perspective
motion
What is stereo vision
Ability of the brain to perceive depth by processing two slightly different images captured by each eye
Based off this, we use a calibrated binocular stereo pair of images
How do we estimate a scene shape using triangulation
Triangulation following the lines from the two image planes to intersect at the real life scene point
Gives reconstruction as intersection of two rays
What does triangulation require us to know
1) Camera pose (calibration - camera parameters)
2) point correspondence - matching of image points
What is the focal length
Distance from the centre of projection O to the image plane
What is (X,Y,Z)
Scene coordinates
What is (x,y,z)
Image coordinates
What is P
The point in the scene and the corresponding point in the projected image
What is the centre of projection
optical/camera centre
refers to the point in space from which the camera’s perspective projection emanates
What is Z
Distance of the camera (O) to the real world point P
same as
Depth (distance from viewer to the point on the object)
What is baseline
The distance between the left and right camera centres (center of projection)
Where is the scene origin placed in the general camera system diagram
centre of the baseline (same y as camera centres)
What is xl and xr
Difference along the x axis between camera centre and point in the image
(c0 and pl)
(c1 and pr)
What is pl and pr
The point p projected in the left and right images
What is the formula for Z
Z = bf / xl - xr
What is disparity
xl - xr
Displacement (along x axis) between conjugate (corresponding) points in left and right images
How does disparity change with object distance to camera
Objects closer to camera → higher disparity → brighter in disparity map
Objects further from camera → lower disparity → darker in disparity map
Why is it important to correctly match points in the left and right image plane
We pass the two rays from camera center through p1 or p2, if these are incorrect the rays will intersect at the completely wrong point P
What are the 3 components of stereo analysis
- Find correspondences
- conjugate pairs of points
- use interesting points
- Reconstruction
- calculate scene coordinates (X,Y,Z)
- Easy, once you have done calibration
- Calibration
- Calculate parameters of cameras (eg b, f…)
What do we assume about finding correspondences in the two images
Assume most scene points are visible in both views
Assume corresponding points are similar
What is the Epipolar constraint
It is a potentially large search space for each candidate point
We place each camera on y axis
so it becomes a 1D search problem along the epipolar line
(still lots of matches)
limits where points from one view will be imaged in other
What are the advantages of using edges for correspondence
They correspond to a significant feature
There arent usually too many of them
We can use image feature (polarity, direction) to verify matches
We can locate them accurately
Multi-scale location (coarse to fine search)
What are the disadvantages of using edges for correspondence
Not all significant structures lie on edges
Edge magnitude features may not be reliable for matching - gradients at corresponding points can be different due to illumination difference
Near-horizontal edges do not provide good localisation - every point along the edge will match with every other point
What detector can we use for finding edges for correspondence
canny detector
What detector can we use for finding interest points
- Moravec operator
- Harris Corners
- LoG (DoG)
What is the Moravec Operator
Non-linear filter
over some neighbourhood
output value is the minimum (eliminates edges)
suppresses non maxima
find points where intensity varies very quickly
What assumption do we make about the cameras
they are callibrated
(know extrinsic parameters relating their poses)
Do we normally have the triangulated stereo system irl
No, that is rare
usually the camera axis are not parallel, the cameras usually are on completely different axes
Correspondence is much harder to find
How do we constrain the correspondence in the non parallel camera axis situation
Again using epipolar lines
two epipolar lines on each image
reduces the problem to 1d search space
What is the epipolar plane
plane containing baseline(connecting two camera centres) and world point p
An epipolar plane intersects with the left and right image planes in epipolar lines
What is the epipole
point of intersection between baseline and image plane
what is the epipolar line
intersection of epipolar plane with the image plane
All epipolar lines intersect at the epipole
where do the Potential matches for p lie
on the corresponding epipolar line l’
and vice versa for p’ on l
What are the 4 main features in generalised stereo geometry
-cameras are at arbitrary orientations (image planes are not parallel)
-separation between optical centres is not parallel to image planes (baseline)
-camera coordinate systems are different from each other and from the scene coordinate system
- coordinate system are related by:
- rotation matrices and translation vectors are used to define camera origin and coordinate system in relation to the scene system
What is Rr, Rl and Tr, Tl and fl, fr
The rotation matrcies, translation vectors and focal length respectively for the right and left cameras
What is pl (or pr)
Vector from the origin Ol to the point pl in the image plane, can be multiplied by some scalar value al to reach real world point P
what can be said about fl and fr
they are not necessarily the same
What can be write pl and pr in terms of
pl = (xl,yl,fl)
the coordinates of pl in the image and the focal length of camera to image plane
What is Pl and Pr
Pl = plal
(how to reach real world point P)
al, ar are scalar values
What is P’l and P’r
P’l = Tl + alRlpl
P’r = Tr + arRrpr
Where to P’l and P’r intersect
Where
Tl + alRlpl = Tr + arRrpr
real world point P
Why is it not simple to mathematically solve for P
because of common measurement inaccuracies
What is the Essential matrix
E = [Tx]R
It relates corresponding image points between cameras using translation and rotation
How can we write X’ and X using the essential matrix
from:
X’ . (T x RX) = 0
X’ . ([Tx] x RX) = 0
We can write
X’^T EX =0
How can we use the essential matrix to solve the parallel camera system
R = I
T = [-d, 0,0]
(the disparity)
E = [Tx]R
p’^T Ep = 0
leads us to y = y’ (image of any point must lie along the same horizontal line)
What is Rectification
Knowing Rr and Rl means we can transform (warp) the images so that the image planes are parallel
Means the search space is now just along the parallel epipolar lines
What does rectification mean about the epipoles
epipoles are at infinity
What does rectification mean about the epipolar lines
epipolar lines are parallel to the horizontal image axis
what does the epipolar constraint make faster
the search for correspondences
What are the 4 main steps of stereo reconstruction
-calibrate cameras
-rectify images
-compute disparity
-estimate depth
What are the soft constraints of correspondences
similarity
uniqueness
ordering
(help further reduce the possible matches)
To find matches in the image pair, what do we assume
-most scene points are visible from both views
-image regions for the matches are similar in appearance
What is the Dense Correspondence Search
For each pixel in first image:
-find corresponding epipolar line in other image
-examine all pixels on line and pick best match (eg SSD)
-triangulate the matches to get depth information
When is dense correspondence search easiest
When epipolar lines are scanelines
-> rectify images first
What is window-based correspondence search
Slide a window along epipolar lines to find the corresponding pixel
localises the search
What is the effect of window size of correspondence search
Want it to be large enough to have sufficient intensity variation
small enough to contain only pixels with the same(ish) disparity
large window -> image appears blotchy and blurred
small window -> fine grain noise
What is sparse correspondence search
Restrict search to sparse set of detected features
(dense, finding correspondence for every pixel -> was too noisy)
Only use a few set of important pixels
use feature descriptor and associated feature distance
(epipolar constraint still applies : only search along particular epipolar line)
Dense adv disadv
-simple process
-more depth estimates, can be useful for surface reconstruction
But
-breaks down in textureless regions
-raw pixels can be brittle
-not good with very different viewpoints
Sparse adv disadv
sparse
-efficient
-can have more reliable feature matches (less sensitive to illumination)
- but have to know enough to know how to pick good features
Difficulties in similarity constraint
-Textureless regions in an image lack distinct features or patterns
-occlusions
- Flat or homogeneous regions contain pixels with similar intensity values and little or no gradient information
Hard to match pixels in these areas
why can raw pixel distances be brittle
sensitive to noise, illumination, perspective
not robust
What is the ordering constraint
points on same surface (opaque object) will be in the same order in both views
violates by transparent objects
What are possible sources of error
-low contrast/ textureless
-occlusions
-camera calibration errors
-violations of brightness constancy (specular reflections)
-large motions
What are 3 main applications of 3D scene reconstructions
Depth for segmentation
View Interpolation
- synthesizing new views of a scene from existing views captured by a stereo camera setup or multiple cameras
Virtual Viewpoint video
- allows viewers to interactively navigate and explore a scene from different viewpoints in real-time (3D virtual tours)
What are Intrinsic camera parameters
-Relate pixel coordinates to image coordinates
-Pixel size (sx, sy): pixels may not be square
-Origin offset (dx, dy): pixel origin may not be on optic axis
We just neet the ratio between pixel grid and image plane
-Focal length, f.
-Not totally independent. (Need dx, dy, f, sx/sy )
What are extrinsic camera parameters
- Rotation matrix R (3X3) (3 free parameters)
- Translation vector (Tx, Ty, Tz)
What is a calibration target
checkboard using to calibrate camera
What can we do with an uncalibrated stereo system
- Calibration is necessary to determine absolute 3D positions
- We can determine relative 3D positions (up to a scale factor) without calibration
- If at least 8 correspondences in the scene are known sufficient; camera parameters can be
estimated
(Human stereopsis just uses relativity in this way)
What is the tradeoff in calibration systems
The accuracy versus how long it is going to take