Past Papers Flashcards
Equation for depth of a 3d point visible from 2 cameras
Z = (f * B)/(x_L - x_R) where f is the focal length of the cameras (assuming it is the same for both) and B is the distance between them
Define image processing
Signal processing applied to an image, with another image as the resulting output
Define the epipolar constraint
We can reduce the problem space for the correspondence problem to a line by using the geometric properties of the camera(s) between images
What is a hyper column
A region of V1 that contains neurons covering the full range of RF types for a single spatial location
What is V1
Primary visual cortex
Performs initial low level processing on incoming informtion (from LGN)
What is a forward problem
One where we know the causes and want to predict the outcome
What is an inverse problem.
One where we know the outcomes and want to infer the causes
List 5 things that an object recognition algorithm should be insensitive to
Illumination
Occlusion
Viewpoint (orientation, scale, translations)
Non rigid deformation.
Within category variations
Draw a cross sectional diagram of how a lense forms an image of a point
Equation of projection (pinhole camera)
When asked about depth in relation to time
Where Vx is velocity of camera and x• is velocity of image point
Steps of canny edge detector
1) convolve image with DoG
2) see image
3) non max suppression of any pixel that has a neighbour perpendicular to the direction of the edge with a higher magnitude
4) hysterisis
Math model used to simulate receptive fields of cortical simple cells
Gabor(x, y) =
How do we simulate complex cells
Repeating convolution with gabor mask but varying orientation, phase, spatial frequency etc
Describe the role of mid level vision
Appropriately Group together image elements
And segment them from other image elements
Starting difference between regiong growing and region merging
Merging: each pixel begins with a unique label, use final marking, region mean used for comparison
Growing: each begins unlabelled, don’t use final marking, individual neighbouring pixels used for comparison
What is the correspondence problem
The problem of finding the same 3d point or location in 2 (or more) images
For coplanar cameras, if the baseline (B) distance increases between the cameras, how does this effect accuracy of measuring depth of a 3d point
This increases disparity which increases accuracy
In a feature based solution to the correspondence problem, explain what is meant by descriptor and detector
A detector is a method used to locate points of interest or image features
A descriptor is a vector for identified points/features to be used for comparison with potential matches
What is RANSAC
RANDOM SAMPLING AND CONSENSUS
1) randomly sample the minimum number of data points to fit the model
2) fit th3 model to this sample
3) test all other data points against this fitted model
4) count #of inliers (consensus set)
5) repeat 1-4 for N trials and choose the parameters that fit best overall (best fit is that with the highest support (# in conessus set)
Cross correlation formula for vectors a and b
Correlation coefficient formula for vectors a and b
What is the difference between top down and bottom up
Top down: coming from internal knowledge, prior experience
Bottom up: coming from image properties
These approaches are not mutually exclusive
Gestalt laws
Bottom up factors:
Proximity
Similarity
Closure
Continuity
Common fate
Symmetry
Common region
Connectivity
What does the LGN do
LGN cells have centre surround RFs
Traditionally viewed to just relay info from retina to cortex
Recent evidence suggests does more
On centre off surround is
Activates If centre brighter than surround
Explain how ganglion cells in the eye and simple cells in v1 work together to detect edges
Ganglion cells are centre surround
Retina is over represented (every area in retina is part of receptive field of several ganglion cells)
With several centre surround lined up, their combination shows edges
Example of function of complex cells in v1
Combine input from several simple cells so can detect patterns that are a combination of single cell detections
Describe how lateral connections in v1 can explain some gestalt laws
Lateral connections connect together areas of v1 that deal with adjacent areas of the visual field
If adjacent cells detect something similar, eg a line segment, this can stand out more than 2 separate segments
If the segments were end-on, this would be continuity
If they were side by side, this would be similarity
What is required of a mask for it to have no effect on intensity
All elements in the mask sum to 1
Example of difference mask
[1, -1]
Positive x direction
Laplacian mask is? Used to?
Difference mask in every direction
Used to detect disconinuities in intensity in every direction
Advantage and disadvantage of laplacian mask? How to mitigate dis
Advantage- good at detecting discontinuities
Disadvantage- maybe too good, most sensitive to a single pixel that stands out from neighbours, therefore increases noise, leading to many false positives during edge detection
Mitigate dis apply averaging filter before applying laplacian, this is overall typically done by Laplacian of Gaussian (convolve gassing with laplacian)
Splitting and merging pseudocode
Splitting:
Start with whole image as 1 region
If all pixels are not similar, Split into 4 quadrants
Repeat until all regions are homogenous
Merging
Compare each region to neighbouring region, merge all that are. similar
Continue until no more regions can merge
Disadvantage of splitting and merging
Works well if regions are fairly homogenous otherwise many spurious regions created
Explain how a CCD forms an RGB image
A CCD is an array of MOS capacitors that accumulate charge from light proportionally to light intensity
Individual diodes are made sensitive to R,G and B by placing a filter between the light sources and the diode
Relating pixels to camera coordinates
Where (Ox, Oy) is the coordinates (in pixels) of the image principal point
α is the magnification factor in the x direction
β is the magnification factor in the y direction
What is a horopter
An imaginary surface on which all points have 0 disparity
Compare image sampling mechanisms used in eye to camera
Camera:
- sensitive to 3 wavelength RGB
- sensing elements occur in fixed ratio across whole image plane
- sampling density is uniform across whole image plane
Eye:
- sensitive to 4 wavelengths RGBW
- sensing elements occur in variable ratios across image plane (cones density highest at fovea, rod density highest outside fovea)
- sampling density is non uniform across image plane
Explain the difference between ‘view centered’ and ‘object centere’ approaches to object recognition
View centred:
3d object is modelled as a set of 2d images of different views of the object
Object centred:
Single 3d model used to describe object
If asked to derive thin lense equation
Make sure to use similar triangles
How do you change focus of a camera
The focal length of the lense is fixed
Therefore we change the distance between the lense and the image plane
-δ2/δy2 = ?
-1
2
-1
When convomving a mask with another
Flip the smaller mask!!!
Steps of agglomerative hierarchical clustering g
Define the aperture problem and suggest how to mitigate
Direction of motion of a small image patch can be ambiguous
Particularly for am edge, direction of motion is only available perpendicular to the edge
Overcome by using info from multiple sensors or by giving preference to image locations where image structure provides unambiguous information about optic flow (eg corners)
Calculating depth when camera is moving along optical axis and given image dimensions/central point
2 constraints applied to correspondence problem for video (note limitations)
Spatial coherence (assuming neighbours have similar optical flow):
Neighbouring points have similar optical flow
fails at discontinuities between surfaces at different depths
Small motion (assuming optical flow vectors have similar magnitude):
Optical flow vectors tend to have small magnitude
Fails if relative motion is fast or frame rate is low
Monocular cues to depth
Interposition/Occlusion
Size familiarity
Texture gradients
Linear perspective
Aerial perspective
Shading
Normalised cross correlation formula
1st derivative masks
2nd derivative masks
Define focus
All rays if light from a scene point converge in a single image point
Define focal length
An intrinsic property of a lense related to its shape, specifically, the distance from the lense at which optical axis intersects with diffraction rays of light that were travelling parallel to the optical axis before passing through the lens
Define focal range
The range of object locations such that blurring due to the difference between the receptor plane and the focal plane is less than the resolution of the receptor device
For a pinhole camera, at what length should image plane be placed to bring object into focus
Use thin lense equation (with moduli)
In image formation, what properties of an image are determined by radiometrix parameters and by geometric parameters
Radiometric parameters tend to determine intensity/colour:
Illumination, surface reflectance, sensor wavelength properties
Geometric properties determine where on the image a scene point appears:
Camera position & orientation in space
Camera optics
Forming DoG mask for On Centre Off surround
Subtract mask with larger σ from mask with smaller
Decomposing 11 x 11 mask involves
11 x 1 row vector and 1 x 11 col vectors
The row vector is convolved with every valid position in the image (11 + 10 operations per mask placement) and then so too is the col vector
How many operations are involved in convolving an image with a DoG
DoG is 2 gaussians with subtraction
G = #of operations in convolving with separated gaussian
D = size of image, which is the subtraction of the convolution with 1 gaussian subtracted from other
2G + D
Single link
Min distance
Complete link
Max distance
Group average
Average distance
Calculating depth from 2 images given image centre
Z = x1Vz/x°
Where x1 is original x displacement from center
Vz is speed of camera moving on z axis/optical axis
X° is (x2 - x1)/time
What is the sliding window approach to object recognition
Sliding window applies a classifier (usually a deep NN) to image patches
Tolerance is achieved by training the classified to recognise the object despite changes in appearance and using different shapes and sizes of image patch