Week 6 Flashcards

1
Q

In the Facial Action Coding System, what are action units and what are they for?

A

1) smallest visually discriminable facial movements, represent muscular activity that produces changes in facial
appearance
2) For detecting & measuring a large number of facial
expressions via a small set of AUs
3) Accurate detection of AUs depends upon proper
identification & tracking of different facial muscles
irrespective of pose, face shape, illumination & image
resolution
4) Detection of all facial fiducial points is even more
challenging than expression recognition itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between additive and non-additive AUs

A

Additive - appearance of each AU is independent

Non-additive - AUs modify each others performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In the context of FACS, what is an event?

A

1) A set of AUs that overlap in time & defines a perceptually
meaningful unit of facial action
2) Constitute a single display
3) Guiding assumption, facial behaviour occurs not continuously but rather as
episodes (events) that manifest themselves as discrete events, AUs that occur together are related & form an
event
4) Event coding can be more efficient than single AUs
5) Addresses problem that some AUs may linger & merge into the background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is a prior model required in AU recognition?

A

1) Most current AU recognition techniques ignore semantic
relationships among AUs, & dynamics of AUs
2) Prior models of spatial-temporal relationships among AUs with image measurements ⇒ robust AU recognition
3) Especially useful when image measurement is unreliable
4) Use knowledge-driven method to learn a prior AU model from different types of qualitative knowledge, therefore no training data required which introduces unreliability of manual scoring, and built in database bias - cannot generalise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Bayesian Prior model, illustrating it with an example?

A

1) BN - directed acyclic graph (DAG) that represents a joint
probability distribution among a set of random variables
2) Can be used as prior model to capture AU knowledge
3) Prior model probabilistically encodes constraints to capture AU
occurrence frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is dynamic Bayesian Network model, illustrating it with an example?

A

Models temporal evolution of a set of random variables X over time to capture dynamic dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the motivation for using a 2D Gabor filter for image measurement?

A

1) good models of receptive fields of a large amount of cells located in the
mammalian Primary Visual Cortex
2) invariant to translation, scale &
rotation
3) band-pass filters, i.e., pass frequencies within a certain frequency range
4) Multiresolution (i.e., multiscale) analysis compatible form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Gabor filter?

A

product of a sinusoidal plane wave & a bivariate elliptic Gaussian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Process of AU measurement extraction

A

1) Detect eyes through a boosted eye detector
2) Normalise image into 64x64 sub-image based on eye positions
3) Apply a set of 6 orientations & 5 scales Gabor filters to give a
6x5x64x64 = 122,880 dimension feature vector for each image.
4) Use AdaBoost classifier to obtain measurement for each AU:
- In training, increase weights of wrongly classified examples in
each iteration to force AdaBoost to focus on the most difficult
samples in training set
- Utilises around 200 Gabor features for each AU
5) Based on the image measurement ei & ground truth AUi, train a
likelihood function that is a conditional probability of the AU measurement
given the actual AU values,
6) need training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Outline the principles of an AdaBoost classifier

A

1) Creates accurate prediction rule by combining many relatively weak & inaccurate rules
2) Combines properties of an efficient classifier & feature selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two types of facial expression recognition

A

Geometric based and appearance based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Geometric based expression recognition

A

1) tracks shape & size of face, and facial components
2) categorises expressions based on relative position of facial components
3) shape models based on characteristic points on face require: accurate detection & tracking of facial landmarks
4) but distance between facial landmarks vary from person to person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Appearance based expression recognition

A

1) facial expressions involve change in local texture
2) uses a bank of filters, e.g., Gabor wavelets, Local Binary Pattern (LBP), to
encode texture
3) high-dimensional features ⇒ applies dimensionality reduction techniques,
e.g., principal component analysis (PCA), linear discriminant analysis
(LDA)
4) preservation of discriminative information ⇒ popular approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Outline an algorithm for detecting eyebrow corners

A

1) Use positions of eyes to select coarse ROIs of eyebrows
2) Detect eyebrows using method similar to upper lip detection
3) Perform adaptive thresholding before applying horizontal Sobel operator
to improve accuracy of eyebrow corner localisation
4) Use horizontal edge detector to reduce false detection of eyebrow
positions due to partial occlusion by hair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Outline an algorithm for detecting lip corners given aligned face ROI and nose position

A

1) Select coarse lips ROI using face width & nose position
2) Apply Gaussian blur to the lips ROI (to remove noise)
3) Apply horizontal Sobel edge detector (to detect upper lip)
4) Apply Otsu-thresholding (automatic clustering based image thresholding)
to remove spurious edges
5) Apply morphological dilation operation (to close gaps along edges)
6) Find the connected components
7) Remove spurious connected components with thresholding
8) Scan image from top & select the largest connected component below
nose as upper lip
9) Locate the left- & right-most positions of connected component as lip
corners

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Extraction of active facial patches

A

1) Depend on position of active facial muscles:
- wrinkles in upper nose region are prominent in disgust & absent in
other expressions
- variation in regions with different expressions
2) Each patch location depends on the corresponding facial
landmark position
3) Keep each patch equal (side of square patch ∼
1/9 face width)

17
Q

Training classifier on set of negative images

A

1) does not contain the facial feature to be detected, e.g., 5000 images of everyday objects

18
Q

Training classifier on set of positive images

A

1) contains one or more instances of the facial feature to be detected
2) location of each feature is specified by: upper left pixel, and height & width of feature
3) representative of the variance between different people, including,
race, gender, & age
4) FERET database (created by National Institute of Standards and
Technology’s (NIST)): 10,000 images of over 1,000 people under different lighting conditions, poses, & angles
5) use 1,500 images to train for each feature
taken at angles ranging from 6) 0 to 45 degrees from frontal view

19
Q

Training classifier on set of positive images

A

1) contains one or more instances of the facial feature to be detected
2) location of each feature is specified by: upper left pixel, and height & width of feature
3) representative of the variance between different people, including,
race, gender, & age
4) FERET database (created by National Institute of Standards and
Technology’s (NIST)): 10,000 images of over 1,000 people under different lighting conditions, poses, & angles
5) use 1,500 images to train for each feature
taken at angles ranging from 6) 0 to 45 degrees from frontal view