Lecture 2 Flashcards
What does a camera do in terms of dimensions?
It converts 3D space into a 2D image using the General projection matrix.
What is the general projection matrix? What is involved in each part?
u ≡ K[R|t]x
Where R is a 3 3 rotation matrix, t is a 31 3D translation vector, and K is the calibration matrix
fx s cx
K= 0 fy cy
0 0 1
fx and fy are focal lengths(often equal), cx and cy is the center of the projection, and s is the skew of the pixels(often 0)
How can we do camera calibration? What is useful about using 2D objects?
We need easy to find points in the 3D world that will have known co-ordinates, a commonly used object for this is a 2D checkerboard, as black and white squares have easy to recognise corners and we know the length of the squares. We then place the origin at one of the internal corners, Z goes into the target. The X and Y plane is the calibration target. We need to decide a unit for our measurements, commonly a centimeter is 1 unit. By using 2D objects in 3D space we remove the z axis from our equation because it is always 0, simplifying our equations.
What is the openCV corner finding method? What are some problems with it? How are the calibration targets usually sized?
Threshold image to black and white, look for black and white quarilaterals, link them into a checkerboard, and then refine the sub-pixels. This allows us to map the 2D corners to 3D target points. This method needs a view of all the corners to work. Often the calibration targets are odd-sized(e.g 7x6), this is helpful for orienting the object, without it multiple awnsers ill be generated.
What variables do we get during camera calibration? What are we trying to get?
A set of 3D points and corresponding 2D points, related by the camera calibration matrix. We are trying to find the calibration matrix, but get the rotation and tranformation as a bonus.
What is the algorithm overview for Zhang’s calibration method?
Each image gives us a homography with 9 values(8 independent values/degrees of freedom), six of these are the pose of the camera, this gives us two constraints on K, meaning we need at least three images(as K has 5 variables). for each image: 1. find homography. 2. derive two constraints on K then using all images' constraints: 3. estimate value of K finally 4. estimate R and t for each image 5. refine the estimate, add in lens distortion.
What is a homography? How do we get one in camera calibration?
A homography is a relation between any two images. In camera calibration we can get one by taking a 2D object and placing it in the 3D world, and then comparing to what the 2D object should look like. This is because the 2D object removes the Z axis from from the 3D points.
How does the direct linear transform relate to finding a homography?
it gives us three equations in the homography matrix, these are linear but not independent, so we use the first two. We need at least four points because H has eight degrees of freedom.
How do we solve our homogenous equations?
From direct linear transformation we the homography equations will be Ah = 0, h = 0 is simple but not useful, instead Ah = 0 = 0h is better, in this case 0 is an eigenvector of A, with eigenvalue 0. We can then use Singular value decomposition to find the eigenvector for the smallest eigen value(approximately 0).
What is an eigenvector and eigenvalue?
An eigenvector is a vector that after a transformation has only been multiplied by a scalar value, the eigenvalue.
Why do we need to normalise transforms?
Entries of A vary in size, with u and v being typically in the thousands, and x and y depending on world units. This means changing some entries has large effects, or others need large changes to correct?
How do we normalise transforms?
apply a transformation and scale to both the real world and non real world units(u’ = Tuu, ˜x’ = Tx˜x)so that the means of both are 0 and the means of the lengths are the square root of 2. We then change the homography with the equation H = T^−1u* H’*Tx
How do we derive constraints from the homography H?
H is equivalent to the camera matrix* r1, r2 and t. Meaning the columns are cKr1, cKr2, cKt.
r1 nd r2 are unit vectors and orthogonal, c is an unknown scale factor. This gives us two constraints on K from H. As we know H the constraints are on B = K^−TK^T, a symmetric matrix, and the constraints are of the form
hi^TB*hj
How do we estimate K?
Each calibration image gives us a homography, two constraints on B, we need three or more images to get the six or more equations required, K can then be recovered from B.
What is the reprojection error?
the difference between the actual location of an object and the image location estimate.