final exam fixed Flashcards
- What is the objective of multiple view geometry?
To understand the 3D structure of a scene given multiple images taken from different perspectives.
- What is the difference between the 3D reconstruction and the Structure from Motion in multiple view geometry?
3D Reconstruction (Stereo Vision): Assumes known intrinsic (K) and extrinsic (R T) parameters to recover 3D scene using two cameras. Structure from Motion (SfM): Recovers 3D scene structure and camera poses simultaneously using multiple images/views (K might be given).
- In stereo vision with parallel cameras what is the meaning of the following equation? Z=bf/(u1−u2)
Depth (Z) = (Baseline (b) × Focal length (f)) / Disparity (u1−u2) where b is the distance between cameras f is the focal length and disparity is the difference in image x-coordinates of the same point between the two images.
- In stereo vision what is the potential issue of a small baseline?
A small baseline limits the depth resolution.
- What is the goal of triangulation?
To estimate the 3D coordinates of a point given its 2D projections in multiple images and the camera positions.
- What are the benefits of stereo rectification in feature matching?
Stereo rectification simplifies the process of finding feature correspondences by making the image planes coplanar.
- What is the epipolar constraint?
Corresponding points on one image must lie on the epipolar lines of the other image. The three vectors of p1 p2 and c1c2 are coplanar.
- Why can’t the scale ambiguity be avoided in multiple view geometry with monocular vision?
Both object distance from the camera and object size are needed to determine the scale which requires prior knowledge.
- What is the reprojection error in two-view geometry?
The distance between the original 2D point and the point obtained by triangulating its 3D position using the estimated (R T) and projecting it back onto the image plane using (R T) and (K1 K2).
- What are the possible causes for outliers in two-view geometry?
Changes in scale and perspective variations in illumination noise and blur and occlusions.
- What is the goal of RANSAC?
To estimate unknown pose when given measurements X which may contain outliers without considering the latent variable Z.
- Explain the procedures of RANSAC when applied to line fitting.
Randomly select a minimal subset to estimate the parameter. Calculate the number of inliers vs. outliers. Repeat k times and choose the parameter with the smallest number of outliers assuming enough inliers exist.
- What are the three main components needed to implement RANSAC?
random selection of data, model estimation and count of the inliers verifying this model
- In RANSAC why do we select the smallest number of data points required to determine the unknown parameter?
The probability of selecting a subset of points entirely of inliers is higher, creating a better estimation of the parameter.
- Describe the four steps required for the sequential structure from motion.
Feature detection -> Feature matching/tracking -> motion estimation -> Local Optimization (bundle adjustment)
- What is the difference between front-end and back-end of visual odometry?
Front-end: Handles feature detection matching and pose estimation between two frames. Back-end: Refines pose among multiple frames.
- Describe the goal the type of correspondence and the name of algorithm for bootstrapping.
2D-to-2D (8-point algorithm): Determines relative pose (up to scale) and 3D location of features from correspondence over two views. 3D-to-2D (Camera Calibration DLT and PnP): Determines absolute pose and intrinsic parameters between 3D feature locations and 2D pixel coordinates. 3D-to-3D (Point cloud registration): Determines relative pose using correspondence between 3D feature locations.
- Describe the goal the type of correspondence and the name of algorithm for localization.
Goal: Determine pose for each additional view using 3D-to-2D correspondence (DLT PnP).
- Why do we need the mapping step in visual odometry?
To extend the structure by selecting new keyframes and extracting new features as the number of features decreases quickly for additional views.
- Why do we need the bundle adjustment step in visual odometry?
needed in visual odometry to refine the 3D structure and camera motion estimates by minimizing the reprojection error across multiple frames
- How is visual SLAM different from visual odometry?
Visual SLAM addresses loop detection and closure to guarantee global consistency in addition to visual odometry’s goals of estimating incremental motion and guaranteeing local consistency.
- Describe the characteristics of the indirect method in visual SLAM.
Extracts features with RANSAC and minimizes reprojection error. Can handle large relative motion between frames but is slow due to RANSAC.
- Describe the characteristics of the direct method in visual SLAM.
Minimizes the photometric error of the image without extracting features using RANSAC. Uses all image information for greater robustness and accuracy. Fast because no RANSAC is used but sensitive to initial guess and cannot handle large relative motion between frames.
- What is the goal of tracking?
To locate a moving object in consecutive video frames.