09 Social Signal Processing Flashcards
Define “Social Intelligence” for IT systems! (1sentence)
Which parts of Your definition apply to the field of Multi Agent Systems and which parts
are related to Social Signal Processing?
- Ability to express and recognize social signals/social behaviors from other human
and IT-agent individuals in order to “function” in a society with other human and
IT-agent individuals in view of (pareto-)optimizing own and other IT agent’s and
fellow human’s utility function (survival, reproduction, …) via cooperation. - green -> Social Signal Processing
- blue -> Multi-Agent Systems
Characterize Reality Mining! (1 sentence) What is the relation between Reality Mining and
Social Signal Processing?
- Reality Mining analyzes all available traces of human behavior (social and
nonsocial) and derive models for this behavior to get scientific knowledge and
applications (e.g. predictions) - Reality Mining may use SSP techniques
ame 3 examples for social signals/social behavior and name 3 examples for behavioral
cues (S)
- Social Signals (Expressing attitude towards elements of a social setting):
- mirroring (if mutual attraction)
- aggressive turn taking behavior
- expression disapproval of sth. (e.g. via disapproving looks)
- expression of sympathy/empathy
- Behavioral Cues:
- facial expressions
- body posture / interaction geometry
- gestures
- expressives (laughter, …)
- emotions reflected in speech prosody (rhythm, intonation, stress)
Define behavioral cue! (1 sentence) What is the relation between social signals and
behavioral cues?
- Behavioral Cues are (series of/parallel/overlapping/single/…) time-series of
perceivable or measurable non-verbal physiological activity. - Multiple behavioral cues (vocal behavior, posture, mutual gaze, interpersonal
distance, …) combine to produce a social signal.
What is prosody? (1 sentence)
- Prosody is the quality of the voice when someone speaks, e.g. pitch, tempo,
energy, … - Often used for social signal detection from audio
For SSP: What is the advantage of unconscious social signals vs. conscious social
signals? (1 sentence)
- Unconscious signals are honest signals, which allows to deduce the actual/true
state/social attitude while conscious signals can be faked more easily
Facial expressions: What are Action Units (AUs)? (1 sentence)
- Action Units represent the smallest discernable facial movements which are used
in the Facial Action Coding System(FACS) to describe Face signs.
Name the 6 basic emotions (after Ekman)!
- fear, sadness, happiness, anger, disgust, surprise
Vocal Behavior: What are Linguistic Vocalizations and Non-Linguistic Vocalizations? (For
each: 1 sentence plus 1 example) What is Backchanneling? (1 sentence)
- Linguistic Vocalization (or segregates) are non-words:
- Prolonged “ääähm” -> embarrassment/feeling uncomfortable in social situation
- Non-linguistic vocalizations are other verbal sounds used as social signals to
express boredom, sexual interest, anxiety, … - e.g. laughter, crying, groaning
- Backchanneling describes that, during a conversation, listeners respond to what
is being said in a verbal or non-verbal way to signify the listener’s attention,
understanding, agreement, etc. (nodding, “yeah”, “hmmm”, …)
Vocal behavior: Name and explain in 1 short sentence each three classes of silence!
- Hesitation silence: occurs when the speaker hesitates, e.g. while explaining
difficult concepts - Psycholinguistic silence: occurs when the speaker has en-/decoding difficulties
language wise - Interactive silence: used to express respect, doubt, ignore people or to attract
attention to other forms of communication (e.g. gazes)
Name and explain in 1 sentence each 3 steps/sub-problems of Speaker Diarization!
- Step 1: Segmentation into speech/non-speech
- First the features get generated by digital signal processing, using Fourier- and
other transformations and using MEL filters to get MEL cepstrum coefficients - Then several trained binary classifiers are used to distinguish between speech
and non-speech on the computed features - Step 2: Detection of speaker transitions
- The speech parts get split into segments
- Statistical methods then decide whether two segments belong to the same
speaker or whether one interval contains one or two speakers - Step 3: Clustering of segments
- The segments get clustered with a clustering method, e.g. hierarchical bottom up
clustering that merges segments with most similar models (Gaussians) and cuts
dendrogram at maximum likelihood
Coarsely define optical flow and derive the optical flow equation!
Motion pattern of pixels, represented by vector field of velocity V(x, y, t) of
intensity:
What is the role of context in Social Signal Processing?
- Behavioral cues can have different meaning if happening in different outer
contexts - Multi-modal combination/fusion of social signals (e.g. audio and interaction
geometry)