Week 10 Flashcards
What are the prerequisites for collecting a multimodal dataset and what are the
challenges involved?
1) quantity of data
2) high diversity w.r.t. subjects’ age, gender & culture, and situational context
3) balanced distribution of instances among classes, or along the range (for continuous models)
4) quality of data (i.e., adequate, realistic & naturalistic)
5) adhering to ideal capture conditions
Current situations about multimodal datasets
1) smaller in size than unimodal datasets
2) more often recorded in lab most are bimodal - audiovisual
3) physiological measures or speech alongside depth images becoming available
What ethical issues must be addressed before creating a multimodal dataset?
1) affect can be very private ⇒ subjects might not always
agree with making genuine & spontaneous affect data
available for study, especially with video & audio
2) moral principles guiding research:
- how ethical issues influence selection & conduct
- subjects are informed & provided consent ⇒ might reduce spontaneity and naturalness of the data
3) different levels of release for different contained modalities
4) whether the research will be beneficial to subjects
5) subjects should not be harmed
Challenges of multimodal data collection
1) To obtain naturalistic display of affects
2) Complex setups for multimodal recordings require careful control of lab conditions, observers’ paradox - presence of experimenter &
awareness of being recorded may influence the subject
3) Synchronisation of multimodal capture streams, different devices/timescales/sampling
4) Sufficient number of independent labellers (or self-labelling):
- not all modalities’ recorded data are sufficiently informative
for human labellers to make affect judgments
- may require self-assessment ⇒ disruptive w.r.t. an
awareness of being in an experiment
10 steps to consider for multimodal datasets:
1) Step 1 - To consider ethics to guide the data collection
2) Step 2 - To consider type of new data and possible reusing of existing material
3) Step 3 - To consider collection of meta information including
demographic data
4) Step 4: The challenges in collecting data from multiple
devices.
5) Step 5: The choice of model or models, and temporal unit
of analysis.
6) Step 6: The labelling method for separate modality or in combination.
7) Step 7: Standardising to foster compatibility of the meta data & the annotation.
8) Step 8: Partitioning data for modelling, optimising & testing.
9) Step 9: Verifying perception and baseline results conducted
individually per modality or for modality combinations.
10) Step 10: The release of the data with highest spread and usage.
What considerations should be made when choosing the appropriate model/s for
a multimodal dataset?
1) emotion model
continuous or categorical (influenced by modalities)
2) temporal unit of analysis:
- physiological measures & video - annotated on a per-frame basis
- acoustic parameters - extracted over larger chunks, e.g., words or turns (a turn is a time during which a subject speaks)
3) compromise - annotation in continuous dimensions (e.g., arousal & valence), but also in time (e.g., every 100 ms):
- for diverse mappings, e.g., averaging over a certain chunk
4) use multiple models:
- enriches flexibility of database
- requires considerable extra effort
- could be applied for modality-specific annotation
What considerations should be made when recording/re-using for a multimodal dataset?
1) recording new data
2) reusing existing material
usually only sparsely available (especially for multimodal)
3) data cover acted, induced & naturalistic emotions
4) increasing use of mobile & wearable devices for naturalistic data
What considerations should be made when synchronising streams for a multimodal dataset?
1) audio & video - a challenge if using several microphones & cameras
2) worn physiological devices not routed via same computer
3) use aligned time stamps or markers for later
synchronisation, may need to be repeated during a take (or trial) to compensate for temporal deviations
What considerations should be made when labelling for a multimodal dataset?
1) not all modalities can be easily annotated by a human rater, e.g., physiological signals
2) self-assessment is not always an option:
- ⇒ several external labellers serve as “expertise of the mass”
- e.g., by majority voting or by taking mean & median (for continuous emotion models)
- number of labellers proportional to level of subjectivity or ambiguity of the labelling, & the complexity of the model
3) multimodal can be annotated modality-wise or in combination:
- acoustic & physiological data - better in conveying arousal
- video or textual data - well suited to convey valence
- not all modalities are necessarily present at all time, e.g., speech
What considerations should be made when partitioning for a multimodal dataset?
1) divides data into partitions for modelling, optimising & testing
2) provides default or suggested form of partitioning, facilitates comparison of results & findings
3) development partitions in addition to training & testing partitions
4) use cross-validation to enable use of as much data as possible for all partitions
5) independence of subjects, context, etc. e.g., by leaving out a subject or subject group at a time
6) keep good balance of all factors throughout the partitions
7) transparent & easy to reproduce, noting that random partitioning is
suboptimal
What considerations should be made when verifying perception and baseline for a multimodal dataset?
1) independent perception test with individuals other than the annotators
2) conducted individually per modality or for modality
combinations
3) via crowd sourcing
4) include machine-based baseline recognition results
What is the role of the evaluator-weighted estimator in the creation of multimodal dataset?
1) to reach rater-weighted gold standard
2) average of individual evaluators’ responses takes into account that each evaluator is subject to an individual amount of disturbance during evaluation
3) weights measure the correlation between the individual annotator’s estimations & the average ratings of all evaluators
4) if the weights are constant among raters, the gold standard is the mean of the raters’ continuous labels
Quality assessment for multimodal affect databases
1) Gold standard is practically never reliable:
- training & testing labels are ambiguous to a certain degree, as subject’s emotion is usually difficult to assess
- an emotion may not be mapped unambiguously to a single category or a point in space
2) Groundtruth - actual truth as measured
3) In interpreting results:
ideally ground ⇒ trained models that process affect data are error-prone, classification error might not be so wrong in ambiguous
cases
4) ⇒ use several annotators to achieve a reliable gold
standard close to the groundtruth
What method for measuring reliability if affect is modelled continuously?
1) (mean) correlation coefficient (CC) or (average) mean linear/absolute error (MLE, MAE)
2) mean square error (MSE)
3) standard deviation
4) use correlation if using only one measure
What method for measuring reliability if affect is modelled categorically?
Fleiss’ Kappa K (most frequently used):
- all raters to rate all data
- if labellers agree throughout ⇒ K equals 1
- if they agree only on the same level as chance would ⇒ K=0
- negative values ⇒ systematic disagreement
- values of 0.4 to 0.6 ⇒ moderate agreement
- values > 0.6 ⇒ good to excellent agreement