Lecture 6 - Object Detection Flashcards

Question 1

Q

What is “Recognition”?

Answer

A

One to many matching - matching one object to many objects

Question 2

Q

What is “Verification”?

Answer

A

One to one matching - matching one object to another to see if they are the same

Question 3

Q

What is Categorisation?

Answer

A

Interclass and Intraclass, comparing objects and the type of object they are

Question 4

Q

What is the difference between detection and recognition?

Answer

A

Are there faces in this image? (binary decision, no localization)
* Where are faces in this image? (face detection)
Is your person of interest present in this image? (again binary decision)
* Where is your person of interest? (face recognition)
* What are these people doing?(activity or event recognition)
REFER TO SLIDES

Question 5

Q

How does scale of detection work?

Answer

A

We can have nested detections
– Detect face
– Detect features such as eye corners, nose tip etc

Question 6

Q

What are design algorithms capable of?

Answer

A

– Classifying images or videos
– Detect and localize image
– Estimate semantic and geometrical attributes
– Classify human activity and events

Question 7

Q

What is semantic vision?

Answer

A

In general, semantics concerns the extraction of meaning from data. Semantic vision seeks to understand not only what objects are present in an image but, perhaps even more importantly, the relationship between those objects.

The ability to attribute relationships between objects demonstrates reasoning, an important step towards true “cognition”.
=====
Semantic vision can transform visual images into descriptions of the world; providing a more robust foundation for change tracking.

Question 8

Q

What are the challanges of detection and recognition?

Answer

A

Shape and Appearance Variations even in a class
Viewpoint Variations
Illumination
Background Clutter
Scale
Occlusion
There can be multiple challanges in one image or there can only one

Question 9

Q

Recognition and detection in the world - what works today?

Answer

A

Reading license plates, zip codes, checks
Fingerprint recognition
Face detection
Recognition of flat textured objects (CD covers, book covers, etc)
REFER TO DEEPFACE EXAMPLE

Question 10

Q

What is the object recognition pipeline

Answer

A

Similar to supervised learning -> REFER TO SLIDES FOR DIAGRAM

Question 11

Q

What are the two primary characteristics for object recognition?

Answer

A

shape and appearance

Question 12

Q

How can shapes be modelled with Principal Component Analysis (PCA)

Answer

A

REFER TO SLIDES 31 - 60
1. Center the data
2. Calculate the covariance matrix
3. Calculate the Eigenvalues
4. Calculate the Eigenvectors
5. Order the eigenvectors
6. Calculate the principal components

Question 13

Q

PCA and Eigenfaces

Answer

A

REFER TO SLIDES

Question 14

Q

How is Reconstruction using PCA done?

Answer

A

Only selecting the top P eigenfaces reduces the dimensionality.
Fewer eigenfaces result in more information loss, and hence less discrimination between faces

Question 15

Q

What are some issues with PCA?

Answer

A

PCA finds directions of maximum variance of the data.
This may not separate classes at all.
Basic PCA is also sensitive to noise and outliers (read other variants e.g. Robust PCA).
Linear Discriminant Analysis (LDA) finds the direction along which between class distance is maximum.
Sometimes PCA is followed by LDA to combine the advantages of both.

Question 16

Q

What is a colour histogram?

Answer

A

Colour histogram is a type of appearance features

Colour stays constant under geometric transformations
Colour is a local feature
– It is defined for each pixel
– It is robust to partial occlusion
Idea:
– can use object colours directly for recognition, or
– better – use statistics of object colours

Question 17

Q

What is RGB

Answer

A

Primaries are monochromatic lights
– for camera: Bayer filter pattern (half green, one quarter red and one quarter blue)
– for monitors; they correspond to the 3 types of phosphors

Question 18

Q

What are the 3 colour models?

Answer

A

RGB (red, green, blue) colour model is the most popular way to mix and create colours
CMYK (cyan, magenta, yellow, key) commercial printers
HSV (hue, saturation, value) in the colour picker of the graphics software

Question 19

Q

What is the colour space, specifically CIE XYZ?

Answer

A

Links physical pure colours (i.e wavelengths) in the electromagnetic visible spectrum and physiological perceived colours in human colour vision.
Primaries 𝑋, 𝑌, and 𝑍 are imaginary, but the matching functions are everywhere positive

Question 20

Q

What is HSV/HSB?

Answer

A

HSV - Hue, Saturation, Value (Brightness)
* HSV is closer to how humans perceive colour.
* Describes colors (hue or tint) in terms of their shade (saturation or amount of gray) and their brightness value.
* Nonlinear – reflects topology of colours by coding hue as an angle

Question 21

Q

What is colour normalisation?

Answer

A

One component of the 3D colour space is intensity
– If a colour vector is multiplied by a scalar, the intensity changes but not the colour itself.
– This means colours can be normalized by the intensity, removing the brightness effect which may vary depending on lighting conditions, cameras, and other factors.
– Note: intensity is given by 𝐼 = (𝑅 + 𝐺 + 𝐵)/3
REFER TO SLIDES FOR OTHER FORMULAS OF R G AND B

Question 22

Q

Object Recognition based on Colour Histograms

Answer

A

Objects are identified by matching a colour histogram from an image region with a colour histogram from a sample of the object.
Technique has been shown to work remarkably robust to :
– changes in object’s orientation
– changes of scale of the object
– partial occlusion, and
– changes of viewing position and direction.
REFER TO SLIDES FOR EXAMPLES

Question 23

Q

What are some comparison measures?

Answer

A

Euclidean distance
Chi-Square distance
KL (Kullback–Leibler divergence)/Jeffreys divergence
EMD (Earth Movers Distance)

Question 24

Q

What is Euclidean distance

Answer

A

Motivation of the Euclidean distance:
– Focuses on the differences
between the histograms.
– Interpretation: distance in the feature space.
– Range: [0, ∞).
– All cells are weighted equally.
– Not very robust to outliers !

Question 25

Q

What is Chi-Square distance

Answer

A

Motivation of the 𝜒^2 distance:
– Statistical background
– Test if two distributions are different.
– Possible to compute a significance score.
– Range: [0, ∞).
– Cells are not weighted equally !
– More robust to outliers than the Euclidean distance, if the histograms contain enough observations

Question 26

Q

What measure is the best?

Answer

A

– It depends on the application
– Euclidean distance is often not robust enough.
– Generally, 𝜒 2 distance gives good performance for histograms
– KL (Kullback–Leibler divergence)/Jeffreys divergence works well sometimes, but is expensive
– EMD (Earth Movers Distance) is the most powerful, but also very expensive.

Question 27

Q

Object Recognition Using Histograms Algorithm

Answer

A

REFER TO SLIDES

Question 28

Q

What is the Machine learning framework?

Answer

A

Training data consists of data samples and the target vectors
Learning / Training: Machine takes training data and automatically learns mapping from data samples to target vectors
Test data
– Target vectors are concealed from the machine
– Machine predicts the target vectors based on previously learned model
– Accuracy can be evaluated by comparing the predicted vectors to the actual vectors

Question 29

Q

What is classification?

Answer

A

Assign input vector to one of two or more classes
Any decision rule divides input space into decision regions separated by decision boundaries

Question 30

Q

What is the Nearest Neighbour Classifier

Answer

A

Partitioning of feature space for two-category 2D data using 1-nearest-neighbour
* Voronoi diagram is a partition of a plane into regions close to each of a given set of objects.
REFER TO SLIDES FOR FORMULA

Question 31

Q

What are soem practical matter with K-NN

Answer

A

Choosing the value of k
– If too small, sensitive to noise points
– If too large, neighbourhood may include points from other classes
– Solution: cross-validation
===
Can produce counter-intuitive results
– Each feature may have a different scale (e.g Height & Weight)
Solution: normalize each feature to zero mean, unit variance
===
Curse of dimensionality
When the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality.
– Solution: no good solution exists so far

Question 32

Q

Linear and Non-Linear SVM

Answer

A

REFER TO SLIDES - Discriminatory and SVM