Exam Preparation Deck Flashcards
What is the advantage of using a 3D face representation in face detection? Any disadvantages?
- Pose/viewing angle/illumination invariance may be achieved.
- Enormous computational/memory requirements (up to 1GB per face)
- Amounts to inverse optics: Have to generate 3D face from 2D images.
Why does the world seem to be uniformly coloured and resolved, even though only the fovea can do such detections?
- Internal visual representation built from multiple fovea frames over time.
- Supports “vision is graphics”: Human vision is a result of a complex graphical process. It is not a direct encoding of input signals.
- Shows the importance of data integration over time.
What computation methods are often used to find the active contours?
- Gradient descent
- Simulated annealing
- Partial differential equations
- Iterative numerical methods
Descibe the two major ways of motion detection.
How do they relate spatial and temporal gradients of the image?
- Ratio of local time-derivative to spatial gradient gives estimate of local image velocity.
- Time derivative of Laplacian-Gaussian-convolved image in the vicinity of Laplacian zero-crossing. Amplitude gives speed, sign gives direction (relative to contour normal).

Define the 3x3 Laplacian operator.
How is it used? Why is the sum of all taps 0?
Used for edge detection.
Gives no response to areas of uniform brightness.

State the compression rate of MPEG (both interframe and intraframe).
How does MPEG compress videos?
Both 50-50%.
Extracted object motions, so predictions of trajectories are possible. This allows a mode of compression.
Describe the three Hadamard conditions for well-posed problem.
- Its solution exists.
- Its solution is unique.
- Its solution depends continuously on input.
What is functional streaming?
- The division-of-labor within the mammalian brain.
- Seems to have different streams for different image processing, such as color/texture processing.
- Different parts of brain specializes in specific tasks.
- But how do they get integrated?
Describe the reflectance map.
- Relates intensities of image to surface orientations of objects.
- Specifies the fraction of incident light reflected, per unit surface area per unit solid angle in camera direction.
- Specified by three parameters: i (illuminant angle), e (emitted ray angle), g (angle between illuminant and emitted ray)
Describe Bayesian inference. How does it work?
Define:
- Prior probability
- Posterior probability
Drawing inferences from data. Takes account of two major information:
- Prior knowledge, usually defined as unconditioned probabilities.
- Conditional probabilities on class conditional data.
The Bayes’ rule often used…. Here P(C_k) is prior.

What does the inner/outer plexiform layer of the mammalian retina do? What’s its purpose?
Outer layer:
- Performs spatial centre/surround comparisons
- Uses on-centre / off-surround isotropic receptive field structures.
- Can be seen as edge detection, or some kind of bandpass filtering.
Inner layer:
- Similar function in time. Sensitive to motion or dynamic aspect of images.
What’s the advantage of second order differential operator… over first order ones?
- First order operators detect edges in polar-sensitive way: positive for +right edge, negative for +left edge.
- Second order ones have the advantage of producing zero-crossings at an edge.
What does the expression ‘signal-to-symbol’ converter mean?
- To the human body, the external world exists as physical signals on sensory surfaces.
- Vision converts it into high-level symbols, which is easier to understand and manipulate for humans.
- Shows why computer vision is hard. It cannot be done merely by signal processing methods.
- There has to be a bridge between ‘signal’ and ‘symbol’.
Define the correspondence problem.
Establishing the point-to-point correspondences in two different images.
State the advantages of Fourier Transform in Computer Vision.
- Convolution can be made more efficient, given kernels of size > 5x5.
- Texture detection can be done by Fourier analysis, as textures are well defined by spatial frequency and orientation characteristics.
- Motion can be detected, by exploiting the spectral co-planarity theorem.
What type of filter is Laplacian of Gaussian?
What is its spatial frequency bandwidth?
It is a bandpass filter.
The bandwidth is approximately 1.3 octaves.
Describe error rate of eigenface algorithm.
Why is it that high?
43% to 50% when large changes of illumination… or taken after one year.
The lack of fundamental invariances is its major flaw.
Name these three terms:
- P(C|x)
- P(x|C)
- P(C)
- P(C|x): Posterior probability of class C, given observation of input x. Outcome of the Bayesian calculation.
- P(x|C): Class conditional likelihood. How likely x will be observed, if object belongs to class C. Requires expert knowledge.
- P(C): Prior. The plausibility of hypothesis C. P(C|x) can be used as the new prior iteratively.
List three methods in extracting 3D shapes
- Use of stereo camera
- Shape-from-shading inference
- Projection of structured light
- Laser range finding
- Extrapolation from images taken from different angles
Note that (2) requires the precise control of the incident ray. Geometric properties has to be known.
Explain why the number of fibres in the feedback projection is ten times more than the count of fibres bringing data up from the retina.
- Supports the theory of hermeneutical cycle.
- Vision is a hypothesis generation and testing process.
- Graphical models are constructed in the brain about external world.
- Graphics are then shaped, constrained by 2D retinal image data.
To construct a face model in 3D, both shape model and texture model has to be extracted.
What is a texture model? How is it used?
- The photographic appearance itself, expressed in shape model coordinates.
- Possible to project texture onto the shape, thus generating models of face in different poses.
Define the decidability of a decision task.
(Hint: ROC curve problem)

Describe the property of a specular surface.
It is mirror-like, obeying Snell’s law.
How does reflectance map impact face recognition.
At more Lambertian surfaces, the uniform reflectance may confound recognition. (Shape will be hard to detect)
How do you measure the distance d of an object, given:
- Focal length f of lenses.
- Base distance b between optical centres.
- Disparities (\alpha, \beta) in the image projections, relative to image centre.

Describe SIFT (Scale Invariant Feature Transform)
- Build a Gaussian pyramid in scale space by sucessively smoothing and subsampling.
- Dominant orientations of features are detected by oriented edge detectors at varying scales.
- Low contrast candidate points and edges are discarded.
- Bins of orientation histograms are normalized relative to dominant gradient direction. (Achieves rotational invariance)
Describe the operations of the canny edge detector
- Smooth the image with a Gaussian kernel.
- Compute gradient vector field over the image.
- Apply non-maximum supression to eliminate spurious edges. Edge should represented by single pt only.
- Double threshold: Label edges as strong/weak/supressed.
- Connectivity constraint: Track edges along image. Weak edges that are not conected to strong ones are eliminated.
When defining and selecting which features to extract in a pattern classification system, what is the goal for the statistical clustering behaviour of the data in terms of the variances within and amongst the different classes?
- Features which minimise the within-class variability and maximise the between-class variability.
- Allows diameters of clusters in feature space to be small compared with the spacings amongst the clusters.
- Minimizes overlap and thus classification errors.