2. Image Transformation Flashcards
What is an image?
Image is a 2D projection of the 3D world
How do you model a perspective projection with a pinhole camera?
Assuming a virtual image:
Important: PINHOLE IS NOT INFINTELY SMALL, one spot of an object projects into a small area, not one spot
What are homogeneous coordinates and why do we need them? How to convert it back no non-homogeneous coords?
- If we do a projection of 3D into 2D (assuming pinhole camera), we do something like:
(x, y, z) -> (x’, y’) where x’ = (f’ * x) / z and y’ the same way
This is a non-linear transformation so we need to make it linear (easier computation, just matrix multiplications) To do that, we use homogeneous vectors where we just add value of 1 do increase the dimensions.
To convert back, just divide by the third coordinate
What is a process we have to take to project an object into a 2D plane using a pinhole camera?
It is a two step process:
Extrinsic camera transformation takes world into camera coordinates.
Intrinsic camera transformation describes the image formation process.
Explain the extrinsic transformation in detail:
- what is it?
- Idea behind it
- Formulas
It is a transformation of coordinates from a world-coords into a camera-coords. It is basically first shifting a camera center to allign with the world-center, and then also taking into account the rotation of the camera so the world-rotation is alligned with the camera rotation.
XC = R(XW - c)
- XC - coords of the object in the camera frame (world)
- XW - coords of the object in the world frame (world)
- c - coords of the camera center in the world frame
- R - rotation matrix
- this is in non-homogeneous coords!!!!
Now, to do it in homogeneous coords:
How many parameters to do extrinsic transofrmation. Which ones?
6 parameters needed
3 for position (x, y, z) of the camera (c)
3 for rotation (R) (3-axis system) for rotation of the camera
What’s the difference between perpsecitve Projection and orthogonal projection (in the matrix)
Orthogonal: 1 is in the fourth column: z gets omitted (not divide by z)
Other one: 1 is in the third column: divide by z
Why some image data is not enough to understand the world?
- low resolution
- sensor noise
-. …
What is the principal axis?
It is a line from the camera center perpendicular to the image plane (right angle)
What is the normalized camera coordinate system?
It is a coordinate system where center is the camera center (x, y) and the z-axis is the principal axis (line perpendicular to the image plane)
What is a principal point?
It is a point p where principal axis intersects the image plane. It is also an origin of the normalized coordinate system.
What is the difference between camera coordinate system and image coordinate system?
Camera coordinate system has the origin (center) at the principal point, while image coordinate system has the center at the corner (bottom left or upper left, think of how frontend has image coords usually)
How to account for Principal Point offset in the Calibration Matrix?
When we had a pinhole camera and we had a transformation between 3D into 2D, we multiplied the (X, Y, Z, 1) with the K matrix (diag of f, f, 1 + 0). The problem is, we need to account for the principal point offset so our origin is in the corner and not in the center. How do we do that?
We modify our K matric (callibration matrix) and add these offsets. Sure, we get the coords with the Z * p, but Z will cancel out when we divide by Z (from homogeneous to non-homogeneous coords)
How do camera and world frame relate (+ formula)?
How do we solve a problem of units (meter -> pixel)?
By multiplying our callibration matrix with diag (Mx, My, 1) A size of one pixel is 1/Mx * 1/My
What is a calibration matrix?
It is a 3x3 matrix that contains intrinsic parameters:
- principal point coords
- focal length
- pixel magnification factors
- skew (not in formula) for non-rectangular pixels
Explain the perspective projection pipeline (with the formula)
XI - coords of the object in the image (2D)
1st step: Extrinsic, convert from world coords into camera coords using rotation and translation params
2nd step: Intrinsic, convert from camera coords into image coords using calibration matrix, appending 0s in the 4th column.
What is a projection matrix? Explain it
Projection matrix is combining extrinsix and intrinsic transformation into one 3x4 matrix that transforms world coordinates into image coordinates (homogeneous).
It first takes the calibration matrix K, appends 0s in the 4th column, and then multiplies with the R-t matrix (R, t, 0, 1). This can be simplifies into P
What is an orthographic projection?
- When an object is infinitely far away from the pinhole so rays fall orthogonally onto an image plane (like there is no pinhole). Depth can be omitted as well as the focal length.
What is the projection matrix of the parallel projection?
1 0 0 0
0 1 0 0
0 0 0 1
This transforms (x, y, z, 1) into (x, y, 1) = (x, y).
Z is just omitted. Pay attention to the 1 in the corner!! If 1 is in the third column, then we need to divide by z to get non-homogeneous coords.