Lecture 6 - Object Detection Flashcards
What is “Recognition”?
- One to many matching - matching one object to many objects
What is “Verification”?
- One to one matching - matching one object to another to see if they are the same
What is Categorisation?
Interclass and Intraclass, comparing objects and the type of object they are
What is the difference between detection and recognition?
Are there faces in this image? (binary decision, no localization)
* Where are faces in this image? (face detection)
Is your person of interest present in this image? (again binary decision)
* Where is your person of interest? (face recognition)
* What are these people doing?(activity or event recognition)
REFER TO SLIDES
How does scale of detection work?
We can have nested detections
– Detect face
– Detect features such as eye corners, nose tip etc
What are design algorithms capable of?
– Classifying images or videos
– Detect and localize image
– Estimate semantic and geometrical attributes
– Classify human activity and events
What is semantic vision?
In general, semantics concerns the extraction of meaning from data. Semantic vision seeks to understand not only what objects are present in an image but, perhaps even more importantly, the relationship between those objects.
The ability to attribute relationships between objects demonstrates reasoning, an important step towards true “cognition”.
=====
Semantic vision can transform visual images into descriptions of the world; providing a more robust foundation for change tracking.
What are the challanges of detection and recognition?
Shape and Appearance Variations even in a class
Viewpoint Variations
Illumination
Background Clutter
Scale
Occlusion
There can be multiple challanges in one image or there can only one
Recognition and detection in the world - what works today?
Reading license plates, zip codes, checks
Fingerprint recognition
Face detection
Recognition of flat textured objects (CD covers, book covers, etc)
REFER TO DEEPFACE EXAMPLE
What is the object recognition pipeline
Similar to supervised learning -> REFER TO SLIDES FOR DIAGRAM
What are the two primary characteristics for object recognition?
shape and appearance
How can shapes be modelled with Principal Component Analysis (PCA)
REFER TO SLIDES 31 - 60
1. Center the data
2. Calculate the covariance matrix
3. Calculate the Eigenvalues
4. Calculate the Eigenvectors
5. Order the eigenvectors
6. Calculate the principal components
PCA and Eigenfaces
REFER TO SLIDES
How is Reconstruction using PCA done?
Only selecting the top P eigenfaces reduces the dimensionality.
Fewer eigenfaces result in more information loss, and hence less discrimination between faces
What are some issues with PCA?
PCA finds directions of maximum variance of the data.
This may not separate classes at all.
Basic PCA is also sensitive to noise and outliers (read other variants e.g. Robust PCA).
Linear Discriminant Analysis (LDA) finds the direction along which between class distance is maximum.
Sometimes PCA is followed by LDA to combine the advantages of both.
What is a colour histogram?
Colour histogram is a type of appearance features
Colour stays constant under geometric transformations
Colour is a local feature
– It is defined for each pixel
– It is robust to partial occlusion
Idea:
– can use object colours directly for recognition, or
– better – use statistics of object colours
What is RGB
Primaries are monochromatic lights
– for camera: Bayer filter pattern (half green, one quarter red and one quarter blue)
– for monitors; they correspond to the 3 types of phosphors
What are the 3 colour models?
- RGB (red, green, blue) colour model is the most popular way to mix and create colours
- CMYK (cyan, magenta, yellow, key) commercial printers
- HSV (hue, saturation, value) in the colour picker of the graphics software
What is the colour space, specifically CIE XYZ?
Links physical pure colours (i.e wavelengths) in the electromagnetic visible spectrum and physiological perceived colours in human colour vision.
Primaries 𝑋, 𝑌, and 𝑍 are imaginary, but the matching functions are everywhere positive
What is HSV/HSB?
HSV - Hue, Saturation, Value (Brightness)
* HSV is closer to how humans perceive colour.
* Describes colors (hue or tint) in terms of their shade (saturation or amount of gray) and their brightness value.
* Nonlinear – reflects topology of colours by coding hue as an angle
What is colour normalisation?
One component of the 3D colour space is intensity
– If a colour vector is multiplied by a scalar, the intensity changes but not the colour itself.
– This means colours can be normalized by the intensity, removing the brightness effect which may vary depending on lighting conditions, cameras, and other factors.
– Note: intensity is given by 𝐼 = (𝑅 + 𝐺 + 𝐵)/3
REFER TO SLIDES FOR OTHER FORMULAS OF R G AND B
Object Recognition based on Colour Histograms
Objects are identified by matching a colour histogram from an image region with a colour histogram from a sample of the object.
Technique has been shown to work remarkably robust to :
– changes in object’s orientation
– changes of scale of the object
– partial occlusion, and
– changes of viewing position and direction.
REFER TO SLIDES FOR EXAMPLES
What are some comparison measures?
Euclidean distance
Chi-Square distance
KL (Kullback–Leibler divergence)/Jeffreys divergence
EMD (Earth Movers Distance)
What is Euclidean distance
Motivation of the Euclidean distance:
– Focuses on the differences
between the histograms.
– Interpretation: distance in the feature space.
– Range: [0, ∞).
– All cells are weighted equally.
– Not very robust to outliers !
What is Chi-Square distance
Motivation of the 𝜒^2 distance:
– Statistical background
– Test if two distributions are different.
– Possible to compute a significance score.
– Range: [0, ∞).
– Cells are not weighted equally !
– More robust to outliers than the Euclidean distance, if the histograms contain enough observations
What measure is the best?
– It depends on the application
– Euclidean distance is often not robust enough.
– Generally, 𝜒 2 distance gives good performance for histograms
– KL (Kullback–Leibler divergence)/Jeffreys divergence works well sometimes, but is expensive
– EMD (Earth Movers Distance) is the most powerful, but also very expensive.
Object Recognition Using Histograms Algorithm
REFER TO SLIDES
What is the Machine learning framework?
- Training data consists of data samples and the target vectors
- Learning / Training: Machine takes training data and automatically learns mapping from data samples to target vectors
Test data
– Target vectors are concealed from the machine
– Machine predicts the target vectors based on previously learned model
– Accuracy can be evaluated by comparing the predicted vectors to the actual vectors
What is classification?
Assign input vector to one of two or more classes
Any decision rule divides input space into decision regions separated by decision boundaries
What is the Nearest Neighbour Classifier
Partitioning of feature space for two-category 2D data using 1-nearest-neighbour
* Voronoi diagram is a partition of a plane into regions close to each of a given set of objects.
REFER TO SLIDES FOR FORMULA
What are soem practical matter with K-NN
Choosing the value of k
– If too small, sensitive to noise points
– If too large, neighbourhood may include points from other classes
– Solution: cross-validation
===
Can produce counter-intuitive results
– Each feature may have a different scale (e.g Height & Weight)
Solution: normalize each feature to zero mean, unit variance
===
Curse of dimensionality
When the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality.
– Solution: no good solution exists so far
Linear and Non-Linear SVM
REFER TO SLIDES - Discriminatory and SVM