w3 w gemini Flashcards
What is a coordinate transformation (in the context of 3D space)?
A change of basis between different reference frames.
What is another term for a coordinate transformation?
A change of basis.
Define a reference frame (or coordinate system) in 3D space.
An arbitrary set of orthogonal axes (X, Y, Z) used to measure the position and orientation of points or items.
What is Visual SLAM (Simultaneous Localisation And Mapping)?
A process used by robots and drones to orient themselves in the real world and create a map of their surroundings.
In Visual SLAM, what coordinate system does the map typically start with?
The robot base’s coordinate system.
In Visual SLAM, what transformation does the robot perform for each new image?
Transforms detected scene points from the camera coordinate system to the robot’s, and then to the world coordinate system.
In multi-view pose estimation, what do we estimate from each camera?
The complete 2D body pose (e.g., skeleton with 2D joints) for each person.
In multi-view pose estimation, how do we obtain an estimate of the 3D body pose?
By converting the 2D estimate into an accurate 3D pose with respect to the camera’s reference frame.
In multi-view pose estimation, what can be chosen as the world reference frame?
The reference frame of one of the cameras, or any arbitrary origin.
What is a convolution mask (or kernel)?
A small matrix used in image processing to perform operations like blurring, sharpening, or edge detection.
What is the purpose of padding an image with zeros before convolution?
To produce an output image of the same size as the input image.
What is a rotation matrix in the context of coordinate transformations?
A 3x3 matrix that describes the rotation needed to align the orientation of one frame to another.
What is a translation vector in the context of coordinate transformations?
A 3x1 vector that describes the translation needed to align the origin of one frame to another.
Explain the concept of ‘extrinsic’ and ‘intrinsic’ rotations (briefly).
Rotations around world axes (extrinsic) vs. rotations around the object’s local axes (intrinsic).
What is aliasing in the context of image down-sampling?
Distortion or misrepresentation of the image due to insufficient sampling. (Downsampling problem of)
How is aliasing avoided when down-sampling images for an image pyramid?
By smoothing the image (convolving with a Gaussian mask) prior to down-sampling.
What is a Gaussian image pyramid?
A multiscale representation of an image at different resolutions obtained by iteratively convolving with a Gaussian filter and down-sampling.
What is a Laplacian image pyramid?
A multiscale representation of an image highlighting intensity discontinuities, obtained by subtracting a Gaussian-smoothed image from the previous level and down-sampling.
Why is it computationally cheaper to keep the mask size fixed and vary the image size in multiscale feature analysis?
Because the number of multiplications required for convolution is significantly reduced.
Give an example of the computational advantage of varying image size over varying mask size in multiscale analysis.
Convolving a 100x100 image with a 6x6 mask requires more multiplications than convolving a 50x50 image with a 3x3 mask.
What is the formula for a 2D Gaussian?
G(x,y) = (1 / (2πσ²)) * exp(-(x² + y²) / (2σ²))
What are the four main categories of image features that can produce intensity-level discontinuities?
Depth discontinuities, Orientation discontinuities, Reflectance discontinuities, Illumination discontinuities.
What is the mathematical operation for convolution?
I’(i, j) = Σk,l I(i+k, j+l)H(−k,−l)
What is the purpose of convolving an image with a Laplacian mask?
To detect edges and regions of rapid intensity change.
What is a Laplacian of Gaussian (LoG) mask?
A convolution mask created by combining a Laplacian mask with a Gaussian mask.
Why is combining a Laplacian mask with a Gaussian mask advantageous for edge detection?
The Gaussian smooths noise, making the Laplacian less sensitive to it while still detecting edges.
What is a Difference of Gaussians (DoG) mask and how is it related to the LoG mask?
It’s another way to approximate the LoG mask, calculated as the difference between two Gaussian-blurred versions of the same image with different standard deviations.
What are the two main approaches for performing multiscale feature analysis?
1) Keep image size fixed and vary mask size. 2) Keep mask size fixed and vary image size.
What is aliasing in the context of image processing?
The distortion or misrepresentation of high-frequency information when sampling at a rate too low to capture it accurately.
How is aliasing typically avoided when down-sampling images?
By applying a low-pass filter (like a Gaussian filter) before down-sampling to remove high-frequency components.
What is a Gaussian image pyramid?
A series of images created by repeatedly smoothing and down-sampling an original image.
What is a Laplacian image pyramid?
A series of images created by taking the difference between adjacent levels of a Gaussian pyramid, capturing the details lost during smoothing.
Explain the concept of ‘separable masks’ in convolution.
A 2D convolution mask is separable if it can be expressed as the convolution of two 1D vectors (a row and a column vector).
Why are separable convolutions more efficient?
They reduce the number of multiplications required for convolution.
What are ‘mask as point-spread functions’?
When convolving a mask with an isolated point, the output is the mask itself, shifted to the point’s location.
What are ‘mask as templates’?
Convolution output is highest when the image region under the mask closely resembles the (rotated) mask.
What are the two main categories of mask weight values?
Smoothing masks and differencing masks.
What are the characteristics of weights in a smoothing mask?
All positive, and sum to 1.
What are the characteristics of weights in a differencing mask?
Contain positive and negative values, and sum to 0.
What is spatial frequency in an image?
The rate of change of image intensity over space. High spatial frequency corresponds to fine details, low to coarse features.
What is a box mask (in image smoothing)?
A simple smoothing mask where all weights are equal (e.g., 1/9 for a 3x3 mask).
Aka mean filter
Why is a Gaussian mask generally better for smoothing than a box mask?
It has a smoother fall-off, giving more weight to nearby pixels and reducing artifacts.
What is the effect of increasing the standard deviation (σ) of a Gaussian smoothing mask?
More blurring, more high frequencies suppressed, noise is more effectively suppressed.
What are Gaussian derivative masks used for?
Edge detection, by emphasizing vertical and horizontal structures.
What is the first step in the Canny edge detector?
Convolution with derivatives of Gaussian masks.
What is the second step in the Canny edge detector?
Calculation of the magnitude and direction of the gradient.
What is the third step in the Canny edge detector?
Non-maximum suppression to thin out edges.
What is the fourth step in the Canny edge detector?
Recursive hysteresis thresholding to link edges.
What is the main drawback of edge detection algorithms?
Results are highly dependent on parameter values, and not all detected edges are meaningful.
What is the general algorithm for feature detection using masks?
Convolve the image with a mask, choose a threshold, and detect the feature where the convolved image exceeds the threshold.
What is the core idea behind template matching across scales?
Convolution can find locations where the image matches a template, but the scale of the feature may be unknown.
What are the two main approaches for finding image features with invariance to scale?
Apply filters of different sizes, or apply a fixed-size filter to images at different scales (image pyramid).
What is the common name for applying filters of a fixed size to an image presented at different sizes?
Image Pyramid.
What is down-sampling?
Decreasing the size of an image.
What is a common problem encountered during down-sampling?
Aliasing.
How can the problem of aliasing be mitigated during down-sampling?
By smoothing the image before down-sampling.
What is the core idea behind Gaussian Pyramid creation?
Repeatedly smooth and down-sample the image.
What is the relationship between the standard deviation of Gaussians used in successive levels of a Gaussian Pyramid?
If smoothing with σ and then again with σ, the result is equivalent to smoothing with √(2)σ.
What is a key property of the Laplacian of Gaussian (LoG) mask regarding intensity discontinuities?
It detects intensity discontinuities (e.g., edges) at all orientations.
How is a Laplacian image pyramid created from a Gaussian pyramid?
By subtracting each level of the Gaussian pyramid from the next higher level (after upsampling the higher level).
What are the key steps involved in linear filtering?
Convolution and using a mask (weights).
Why is edge detection important in computer vision?
It is important for subsequent analysis and can be sufficient for recognizing the content of an image.
What are the main steps in edge detection?
Requires filtering, and involves convolution.
What are the advantages of using Gaussian derivative masks for edge detection compared to the DoG mask?
They tend to produce more robust results.
What is the main idea behind multi-scale feature detection?
To find features that are invariant to scale.
What are the two main types of image pyramids?
Gaussian image pyramids and Laplacian image pyramids.
LoG mask
-1/8 -1/8 -1/8
-1/8 1 -1/8
-1/8 -1/8 -1/8