Exam Preparation Deck Flashcards by Ryan Lau

What is the advantage of using a 3D face representation in face detection? Any disadvantages?

Pose/viewing angle/illumination invariance may be achieved.
Enormous computational/memory requirements (up to 1GB per face)
Amounts to inverse optics: Have to generate 3D face from 2D images.

How well did you know this?

Not at all

Perfectly

Why does the world seem to be uniformly coloured and resolved, even though only the fovea can do such detections?

Internal visual representation built from multiple fovea frames over time.
Supports “vision is graphics”: Human vision is a result of a complex graphical process. It is not a direct encoding of input signals.
Shows the importance of data integration over time.

How well did you know this?

Not at all

Perfectly

What computation methods are often used to find the active contours?

Gradient descent
Simulated annealing
Partial differential equations
Iterative numerical methods

How well did you know this?

Not at all

Perfectly

Descibe the two major ways of motion detection.

How do they relate spatial and temporal gradients of the image?

Ratio of local time-derivative to spatial gradient gives estimate of local image velocity.
Time derivative of Laplacian-Gaussian-convolved image in the vicinity of Laplacian zero-crossing. Amplitude gives speed, sign gives direction (relative to contour normal).

How well did you know this?

Not at all

Perfectly

Define the 3x3 Laplacian operator.

How is it used? Why is the sum of all taps 0?

Used for edge detection.

Gives no response to areas of uniform brightness.

How well did you know this?

Not at all

Perfectly

State the compression rate of MPEG (both interframe and intraframe).

How does MPEG compress videos?

Both 50-50%.

Extracted object motions, so predictions of trajectories are possible. This allows a mode of compression.

How well did you know this?

Not at all

Perfectly

Describe the three Hadamard conditions for well-posed problem.

Its solution exists.
Its solution is unique.
Its solution depends continuously on input.

How well did you know this?

Not at all

Perfectly

What is functional streaming?

The division-of-labor within the mammalian brain.
Seems to have different streams for different image processing, such as color/texture processing.
Different parts of brain specializes in specific tasks.
But how do they get integrated?

How well did you know this?

Not at all

Perfectly

Describe the reflectance map.

Relates intensities of image to surface orientations of objects.
Specifies the fraction of incident light reflected, per unit surface area per unit solid angle in camera direction.
Specified by three parameters: i (illuminant angle), e (emitted ray angle), g (angle between illuminant and emitted ray)

How well did you know this?

Not at all

Perfectly

Describe Bayesian inference. How does it work?

Define:

Prior probability
Posterior probability

Drawing inferences from data. Takes account of two major information:

Prior knowledge, usually defined as unconditioned probabilities.
Conditional probabilities on class conditional data.

The Bayes’ rule often used…. Here P(C_k) is prior.

How well did you know this?

Not at all

Perfectly

What does the inner/outer plexiform layer of the mammalian retina do? What’s its purpose?

Outer layer:

Performs spatial centre/surround comparisons
Uses on-centre / off-surround isotropic receptive field structures.
Can be seen as edge detection, or some kind of bandpass filtering.

Inner layer:

Similar function in time. Sensitive to motion or dynamic aspect of images.

How well did you know this?

Not at all

Perfectly

What’s the advantage of second order differential operator… over first order ones?

First order operators detect edges in polar-sensitive way: positive for +right edge, negative for +left edge.
Second order ones have the advantage of producing zero-crossings at an edge.

How well did you know this?

Not at all

Perfectly

What does the expression ‘signal-to-symbol’ converter mean?

To the human body, the external world exists as physical signals on sensory surfaces.
Vision converts it into high-level symbols, which is easier to understand and manipulate for humans.
Shows why computer vision is hard. It cannot be done merely by signal processing methods.
There has to be a bridge between ‘signal’ and ‘symbol’.

How well did you know this?

Not at all

Perfectly

Define the correspondence problem.

Establishing the point-to-point correspondences in two different images.

How well did you know this?

Not at all

Perfectly

State the advantages of Fourier Transform in Computer Vision.

Convolution can be made more efficient, given kernels of size > 5x5.
Texture detection can be done by Fourier analysis, as textures are well defined by spatial frequency and orientation characteristics.
Motion can be detected, by exploiting the spectral co-planarity theorem.

How well did you know this?

Not at all

Perfectly

What type of filter is Laplacian of Gaussian?

What is its spatial frequency bandwidth?

It is a bandpass filter.

The bandwidth is approximately 1.3 octaves.

Describe error rate of eigenface algorithm.

Why is it that high?

43% to 50% when large changes of illumination… or taken after one year.

The lack of fundamental invariances is its major flaw.

Name these three terms:

P(C|x)
P(x|C)
P(C)

P(C|x): Posterior probability of class C, given observation of input x. Outcome of the Bayesian calculation.
P(x|C): Class conditional likelihood. How likely x will be observed, if object belongs to class C. Requires expert knowledge.
P(C): Prior. The plausibility of hypothesis C. P(C|x) can be used as the new prior iteratively.

List three methods in extracting 3D shapes

Use of stereo camera
Shape-from-shading inference
Projection of structured light
Laser range finding
Extrapolation from images taken from different angles

Note that (2) requires the precise control of the incident ray. Geometric properties has to be known.

Explain why the number of fibres in the feedback projection is ten times more than the count of fibres bringing data up from the retina.

Supports the theory of hermeneutical cycle.
Vision is a hypothesis generation and testing process.
Graphical models are constructed in the brain about external world.
Graphics are then shaped, constrained by 2D retinal image data.

To construct a face model in 3D, both shape model and texture model has to be extracted.

What is a texture model? How is it used?

The photographic appearance itself, expressed in shape model coordinates.
Possible to project texture onto the shape, thus generating models of face in different poses.

Define the decidability of a decision task.

(Hint: ROC curve problem)

Describe the property of a specular surface.

It is mirror-like, obeying Snell’s law.

How does reflectance map impact face recognition.

At more Lambertian surfaces, the uniform reflectance may confound recognition. (Shape will be hard to detect)

How do you measure the distance d of an object, given: * Focal length f of lenses. * Base distance b between optical centres. * Disparities (\alpha, \beta) in the image projections, relative to image centre.

Describe SIFT (Scale Invariant Feature Transform)

* Build a **Gaussian pyramid** in scale space by sucessively smoothing and subsampling. * Dominant orientations of features are detected by **oriented edge detectors** at varying scales. * Low contrast candidate points and edges are **discarded**. * Bins of orientation histograms are normalized relative to dominant gradient direction. (Achieves **rotational** invariance)

Describe the operations of the canny edge detector

1. **Smooth** the image with a Gaussian kernel. 2. Compute **gradient vector field** over the image. 3. Apply **non-maximum supression** to eliminate spurious edges. Edge should represented by single pt only. 4. **Double threshold**: Label edges as strong/weak/supressed. 5. **Connectivity constraint**: Track edges along image. Weak edges that are not conected to strong ones are eliminated.

When defining and selecting which features to extract in a pattern classification system, what is the goal for the **statistical clustering behaviour** of the data in terms of the variances within and amongst the different classes?

* Features which minimise the **within-class variability** and maximise the **between-class variability**. * Allows **diameters** of clusters in feature space to be **small** compared with the spacings amongst the clusters. * **Minimizes overlap** and thus classification errors.