Week 5 Flashcards
How does the 3-dimensional human observer-based approach to emotion measurement differ from message-based measurement and sign-based measurement? State the advantages and disadvantages of the 3-dimensional approach.
1) emphasises similarities between emotions
2) represents emotion in terms of 2 or 3 underlying dimensions:
- pleasantness-unpleasantness vs attention-rejection/arousal-sleepiness
- dominance-submissiveness as third dimension
3) advantages:
- positive & negative affects measured over intensity ranges of hundreds of points, requiring little expertise
- for multiple independent & unbiased ratings: scores
aggregated across multiple raters
4) disadvantages:
- not well suited to representing discrete emotions
- assume emotion may be inferred from facial expressions
- signal involved in communicating emotion are unspecified
What are the challenges of automated face analysis for emotion recognition?
1) non-frontal pose, & moderate to large head motion ⇒ difficult image registration
2) many facial actions are inherently subtle ⇒ difficult to model
3) temporal dynamics of actions highly variable
4) discrete AUs can modify each other’s appearance
5) individual differences in face shape & appearance
⇒ difficult to generalise
6) classifiers can suffer from over fitting when trained with insufficient
examples
Recent work on expression detection in naturalistic settings
1) partial occlusion
2) pose variation
3) rigid head movement
4) lip movements
Pipeline for AFA
1) input image/vid
2) facial landmark detection/tracking
3) face alignment
4) feature extraction
5) dimensionality reduction
6) action unit classification
What is the purpose of image registration in emotion recognition?
1) To remove effects of spatial variation in face position, rotation, & facial proportions
2) ⇒ Register images to size & orientation in the canonical perspective (our preferred way of viewing an object)
3) 3D transformation estimated from monocular (up to a scale
factor) or multiple cameras using structure from motion
algorithms
4) For small to moderate out-of-plane rotation a moderate distance from the camera - 2D projected motion field of a
3D planar surface can be recovered with an affine model of 6 parameters
What is meant by appearance features and how can they be represented?
1) changes in skin texture, e.g., wrinkling
2) simplest - a vector of raw pixel-intensity values
problem - lighting conditions affect texture
3) Gabor wavelets or magnitudes, histogram of oriented
gradients (HOG), & Scale Invariant Feature Transform
(SIFT) - more robust to registration error
With the aid of a figure explain what is meant by the brightness consistency
constraint that is exploited in optical flow for motion estimation.
any differences in image brightness for corresponding points in two image
frames denote displacement and thus motion
Describe the three types of supervised learning used in automated face analysis
for emotion recognition.
1) Event categories (e.g., emotion labels or AUs) or
dimensions defined in advance in labelled training data
2) Static modelling
- each video frame is evaluated independently
- uses NNs, SVM classifiers, boosting
3) Temporal modelling
- frames are segmented into sequences
- modelled with a variant of dynamic Bayesian networks, e.g., hidden Markov models (HMMs)
- uses HMMs to temporally segment actions by establishing a correspondence between the action’s onset, peak &
offset, and an underlying latent state
What are the challenges encountered when creating a facial expressions database?
1) Variations in video: pose, illumination, resolution, occlusion, facial expression, actions (intensity & timing),
individual differences in subjects
2) Most have used directed facial action tasks:
difference between posed and spontaneous FA’s (complicates pattern recog approach e.g. HMMs),
holistic expressions
3) Coder variability:
- “test-retest” unreliability, assign different AUs to same
segment on different occasions
-“alternate-form” unreliability, different coders may assign
different AU
- ⇒ coders should be certified to minimise errors
- error due to manual data entry
- error in “ground truth” adversely affects classifier training &
performance
- difference in manual coding between databases
⇒ impaired generalizability of classifiers from one database to another