L8: Face Recognition Flashcards
❗️❗️❗️Boosting detector:
The features used by the Viola-Jones detector, including how to compute them
Haar features, used for identifying faces.
Calculated by subtracting sum of pixels in the black part from the sum of pixels in the white part. Results in a single scalar pr. feature. Done by using integral image.
- If the result being checked exceeds the threshold, indicates the region being evaluated is a RoI
❗️❗️❗️Boosting detector:
The basic building blocks, the weak learners, including intuition
One weak learner pr. feature. Each weak learner is a two-way classifier. Uses Haar features and checks if the weak learner is above the threshold, if so probably a RoI, otherwise no. Works in a cascade given a weighted sum for M-chosen weak learners.
Intuition:
For each weak classifier, the ones that are on the wrong side of the decision boundary for the chosen weak learner, an importance weight is increased for difficult cases, which allows for a new weak learner to be chosen next as the wrpunes ones are now more important next time. The sum of the 3 classifiers (rød of blå dots eksempel) is the weighted sum (strong classifier)
❗️❗️❗️Boosting detector:
The difference between the weak learners and the strong classifier
Strong classifier is cascaded weighted sum of M-chossen weak learners, where weak learners is based on Haar features.
❗️❗️❗️Boosting detector:
The recipe for the boosting algorithm - Explain all steps when asked (not in depth)
- Input: set of N labeled image patches (face/non-face)
- Initialization: Weight w initialized uniformed for each training example
1. Normalize all weights to a sum of 1
2. apply weights and compute misclassification rate
3. find smallest classification error in each iteration
4. reduce weights for the ones it got right, increase for nthe wrong ones as this will make them more important to the next weak learner - Repeat M times
- Finalization: final strong classifier is a weighted sum of the M chosen weak learners.
❗️❗️❗️Eigenfaces:
How to compute the eigenface decomposition
Given set of tranining images with N examples.
Image is flattened to a 1D vector x with D dimension (symmetric for EVD).
Compute mean m and covariance C of all N images.
Do EVD on C, gives D eigenvalues λ_i and eigenvectors u_i
u_i → principal direction in face space sorted by importance to λ_i. u_i called eigenfaces
❗️❗️❗️Eigenfaces:
How to project a face to the face space
Given novel input image x, project it to principal direction. Gives a scalar. By taking subset largest eigenvalue (direction in eigenspace) to approximate the original space.
Can approximate our training set M principal components called face space.
- Face space → PCA space where it is trained with only faces and catches only that
❗️❗️❗️Eigenfaces:
How to detect if it is a face or non-face
Have novel input image x and approximation x_tilde and compute how well inputs is approximated by our face space.
Faceness measure DFFS = x - x_tilde
- Close already part of the face space, otherwise it is new.
❗️❗️❗️Eigenfaces:
How to use face space to detect/recognize a face
Use distance in face space. Project it to face space to become x_tilde and compare with training set y_tilde (DIFS).
- Choose identity of the face from training set with smallest distance to the new face x
❗️❗️❗️Eigenfaces:
The limitations of the eigenface algorithm
Can’t be sure of same-person faces cluster nicely togetehr in face space.
- Sensitive to variation in lighting conditions during image acquisition.
“Intra-class” →Differences in the same person
“Inter-class” → Differences between different person
“intra-class” effects can even be larger than “inter-class” variations.
❗️❗️❗️Eigenfaces:
Why is eigenfaces not a PCA projection?
Eigenvectors obtained from PCA serves as the basis for the face representation.
FaceNet
❗️❗️❗️FaceNet:
The structure of FaceNet, especially compared to how eigenfaces projects faces
Has many thousand learnable parameters to learn the projection.
Images are input in batches, fed through a deep architecture of neural net layers, normalized to unit norm, resulting in another type of “face space” often called an “embedding”, with 128 vector components.
In relation to EF:
- parameter-free, based on EVD directly computed from cov, giving eigenvectors and eigenvalues.
❗️❗️❗️FaceNet:
The formulation of the triplet loss
Triplet loss is implemented by continuously selecting triplets from the training data:
1. For each random training image: The Anchor
2. An image from the same class (positive)
3. and an image from another class (negative)
is sampled.
“Pulls” positive embedding towards the anchor embedding, and “pushes” negative embedding away from the anchor.
Final loss is squared difference between f(anchor) and f(positive) minus f(anchor) and f(negative) plus margin (separion threshold between positive and negative pairs.
❗️❗️❗️FaceNet:
The difference between positives and (hard) negatives during training
Sample “hard negatives” during training to help the training. A hard negative is an embedding closer to the anchor than the positive embedding.
What is detection in general
Boosting detector
Find out where the faces are in the image, output will be RoI. Can either be a face or non-face = two-way classification problem
What types of detection is there?
Boosting detector
Feature-, template, appearance based detection.
- Viola-Jones = appearance