test Flashcards

Question

How do different modalities in communication affect our perception of experiences such as dining?

Answer 1

When we eat in a fancy restaurant a dish can taste differen than when you taste the same dish at home or at a fastfood chain.

Answer 2

"Multimodal communication is considered the most natural form of human interaction because it involves multiple sensory modalities. Traditionally, speakers observe their addressees and vice versa,

spoken communication without visual contact is still relatively rare, underscoring the innate multimodal nature of human communication.

"

Answer 3

This visual information from the face significantly influences how we perceive and interpret spoken language, as the coordination of visual and auditory components enhances our understanding and response to communication.

Answer 4

The ventriloquism effect is a perceptual phenomenon where auditory and visual signals, presented from different locations, are perceived as coming from the same source. The brain links the sound to the visual signal, creating a perception that they are spatially related.

This strong effect, which humans can hardly suppress, suggests a form of recalibration by the brain to bridge the difference between visual and auditory locations, enhancing the integration of multimodal stimuli.

Answer 5

The McGurk effect is a perceptual phenomenon where conflicting visual and auditory signals lead to a third, different perception. For example, when a video of someone saying /ga/ is paired with the sound of /ba/, people often hear /da/.

This effect was discovered by accident by McGurk and his assistant John MacDonald while researching how children perceive speech and whether they are more responsive to the face or voice of their mother. The McGurk effect illustrates how our perception integrates and sometimes confuses combined sensory inputs.

Answer 6

The Cocktail Party Phenomenon refers to our ability to focus on one person’s speech in a noisy or crowded environment. This ability highlights how our auditory system can selectively attend to a single source of sound among many distractions, a crucial skill for effective communication in social settings.

Answer 7

Lipreading involves interpreting visual cues from the movements of a speaker’s lips, which significantly aid in speech perception, especially in challenging auditory conditions. This visual information can compensate for poor audio quality or background noise, allowing for better understanding of spoken words.

Answer 8

"Compensatory effects occur when there is noise or interference in one sensory channel (auditory or visual), prompting the other channel to enhance its input to compensate for the missing or unclear information. "

Answer 9

"As children grow, they not only enhance their verbal skills such as lexicon, grammar, and pronunciation, but they also become more proficient in using and interpreting nonverbal features like voice tone and body language. "

Answer 10

In the womb: Intonation patterns, rhytm and features of the voicee.
As young infant: infants learn to imitate facial gesturs lik tongu protrusion and mouth opening.
Infants: quickly learn to integrate information coming from different modalitis.

Answer 11

"1. As a child grows older, nonverbal features become more functional.

2. Children learn to associate specific nonverbal forms, like nodding or higher intonation at the end of a sentence, with particular communicative or social functions.

3.This change is due to increasing social awareness and exposure to a varied environment (family, school, society)."

Answer 12

" Preference for low-ending (lower pitch or frequency) contours is due to air pressure and lung energy decreasing naturally. These factors show that intonation and rhythm are influenced by the innate biological predispositions of infants."

Answer 13

"Nonverbal features may reveal differences in social awareness between younger children, older children, and adults.

This is a working hypothesis suggesting that as children grow, their use and understanding of nonverbal features evolve, reflecting their increasing social awareness."

Answer 14

Work into cues to basic emotions suggests a strong genetic, biological basis:

Baby: crying when sad
Dog: happy when tail moves

etc.

Answer 15

"Blind people produce facial expressions similar to those of their family members and to each other, despite lacking visual exposure. This suggests a genetic foundation for nonverbal features."

Answer 16

Participants are led to believe they are taking part in a memory experiment. The cover story is that the study investigates the effect of context and reading aloud.

Experiment stages:

Participants imagine words fitting a specific context (e.g., organs of the body).
They see 10 words on a screen, shown one by one.
They read aloud the words as soon as they appear.
They recall as many words as possible afterward.

Answer 17

"

The word ""liver"" appeared in two contexts:

Normal Context: organs of the human body
Surprise Context: favorite food items for Dutch kids

This was combined with other questions about cities, pets, etc.

"

Answer 18

"About 25 subjects (Dutch) participated in the experiment."

Answer 19

Verbal = wordy
Non-verbal = de rest

Answer 20

if voice x language would be a thing, than body language would be sign language.

Speech is most of the times intentional communicating while language all different ways of communicating are.

Answer 21

There is a literal translation of expressions/signs, limited things they can refer to, wrong or right)  it has “words”
Each sign can be equated to a specific meaning (similar to spoken/written language)
It is an intentional and structured symbolic body movements that constitute a form of language expression (linguistic nature)

Answer 22

"

Form: The phonetic or gestural elements of words, such as phonemes, morphemes, or hand movements.
Meaning: The denotation of a word, which includes objects, actions, or concepts that the word represents, and its syntactic status (how the word functions within the structure of a sentence or phrase).

"

Answer 23

"Denotation is the specific, literal meaning of a word, independent of any emotional or cultural connotations. It refers to what the word directly represents or describes. For example, the denotation of ""to read"" is the action of interpreting written text."

Answer 24

" Syntactic status refers to the role that a word plays within the structure of a sentence or phrase. It defines how a word functions grammatically, such as being a subject, object, verb, or modifier."

Answer 25

"

Words, signs, and morphemes function differently across languages to convey meaning:

Words: In languages like English, meanings are conveyed through distinct words (e.g., ""read,"" ""reads,"" ""reading"").
Morphemes: In languages like Turkish, meanings are conveyed by adding morphemes (smallest units of meaning) to a root word (e.g., root ""ok"" in ""okuma"" for ""read"" and ""okur"" for ""he/she/it reads"").
Signs: In sign languages, meanings are conveyed through signs, which have their own grammar and syntax (e.g., Dutch Sign Language is not a direct translation of spoken Dutch).

"

Answer 26

"

Turkish uses morphemes to convey(overdragen) meaning:

Words are formed by adding morphemes to a root.
Example: The root ""ok"" (read) can become ""okuma"" (to read) or ""okur"" (he/she/it reads).
This method, known as agglutination, allows a single root to take on various grammatical and semantic roles by adding different morphemes.

"

Answer 27

"

English: Uses distinct words for different meanings (e.g., ""read,"" ""reads,"" ""reading"").
Turkish: Uses morphemes added to a root to convey different meanings (e.g., root ""ok"" for read, ""okuma"" for to read, ""okur"" for he/she/it reads).
English words change forms less frequently compared to Turkish, which systematically uses morphemes.

"

Answer 28

"A morpheme is the smallest unit of meaning in a language. It can be a word or a part of a word (like a prefix or suffix) that cannot be broken down further without losing or altering its meaning. For example, in the word ""unhappiness,"" there are three morphemes: ""un-"" (a prefix meaning ""not""), ""happy"" (the root), and ""-ness"" (a suffix meaning ""state of"")."

Answer 29

Modalities: different ways of expressing language

Sign languages are real languages (own grammar and syntax)! Sign language of the NL (NGT) is not signed Dutch (not a direct translation of spoken language)
Signs are conventional (own vocabulary and rules for expression), not mimicry (nabootsen).

Answer 30

"According to Neil Cohn from the Visual Language Lab, visual language (e.g., in comics) is also considered a language because language is not restricted to spoken or written forms but can manifest in various modalities ."

Answer 31

modality does not matter for verbality.

Answer 32

Vocal folds closed at the beginning of the speech process.
Air pressure from lungs is generated.
Vocal folds open due to lung pressure, allowing air to pass through.
Pressure released is influenced by muscle tension and emotional state.
Vocal folds close again, and the cycle repeats about 100-300 times per second.

Answer 33

"The frequency of the vocal fold vibrations determines the pitch of the voice; heavier vocal folds result in a lower frequency and lower voice."

Answer 34

"Tenser vocal folds produce higher-pitched sounds."

Answer 35

" Larger vocal folds vibrate more slowly, resulting in a lower frequency and lower voice."

Answer 36

"Variations in vocal fold size are due to genetics and hormonal changes during puberty."

Answer 37

" Tenseness and size of the vocal folds influence their vibration and the pitch of the voice."

Answer 38

"The position of the tongue changes the resonance of higher frequencies, resulting in different vowels."

Answer 39

"Vowel height refers to how high the tongue is in the mouth, with different heights producing different vowels."

Answer 40

"Vowel backness refers to the position of the tongue in the mouth (front/back), influencing vowel sounds."

Answer 41

"Lip rounding involves forming the lips in a circle (rounded vowel) or not (unrounded), affecting the sound of vowels."

Answer 42

"Vowel tenseness refers to stressed/tense vowels, which can change the quality of the vowel sound."

Answer 43

"Consonanten beperken of stoppen de luchtstroom, wat leidt tot hoorbare fricatie of onderbreking."

Answer 44

" De plaats van articulatie verwijst naar waar in het spraakkanaal de luchtstroom wordt beperkt of gestopt om een consonant te produceren."

Answer 45

"Bilabiale consonanten worden geproduceerd met twee lippen, zoals p, b, en m."

Answer 46

"Labiodentale consonanten worden geproduceerd met lippen en tanden, zoals f."

Answer 47

"Interdentale consonanten worden geproduceerd tussen de tanden, zoals th."

Answer 48

" Alveolaire consonanten worden geproduceerd bij de richel achter de tanden, zoals t en d."

Answer 49

"Alveo-palatale consonanten worden geproduceerd bij het harde gehemelte, zoals j en y"

Answer 50

"Velare consonanten worden geproduceerd bij het zachte gehemelte, zoals k en ng in ""going"" en ""uncle""."

Answer 51

"Glottale consonanten worden geproduceerd in de keel, zoals h."

Answer 52

"De wijze van articulatie beschrijft hoe de luchtstroom wordt gemanipuleerd om hoorbare fricatie of onderbreking te produceren."

Answer 53

"Een stop of plosief is het blokkeren van het geluid en het vervolgens loslaten."

Answer 54

"Een fricatief ontstaat door het vernauwen van de luchtstroom met de tong. Voorbeelden: f, v, s, z."

Answer 55

"Een affricatief combineert een orale stop (plosief) en een fricatief. Voorbeelden: tʃ zoals in ""chop"", dʒ zoals in ""judge""."

Answer 56

"Een liquid laat de luchtstroom over de zijkant van de tong stromen. Voorbeelden: l, r."

Answer 57

"Een glide heeft slechts een milde obstructie en in sommige talen worden deze als klinkers beschouwd. Voorbeelden: w, j zoals in ""yes""."

Answer 58

"Voicedness verwijst naar het trillen van de stembanden tijdens de productie van een consonant. Voorbeelden: voiced - b, d; voiceless - p, t."

Answer 59

"Een voiced consonant heeft trillende stembanden. Voorbeelden: b, d, g."

Answer 60

"Een voiceless consonant heeft geen trillende stembanden. Voorbeelden: p, t, k."

Answer 61

"Alle klinkers zijn per definitie voiced. Voorbeelden: a, e, i, o, u."

Answer 62

the pitch (a.k.a. intonation)

Answer 63

F0 is the fundamental frequency, representing the rate at which the vocal folds vibrate. A lower F0 corresponds to a lower-pitched voice, while a higher F0 corresponds to a higher-pitched voice.

Answer 64

Pitch is the perception of the frequency of vocal fold vibrations.

The faster the vibration (higher F0), the higher the pitch;
the slower the vibration (lower F0), the lower the pitch.

Answer 65

"F0 is crucial in analyzing voice stress and emotional states.

Variations in F0 can indicate different stress levels, emotional conditions, and even cognitive loads"

Answer 66

"Factors affecting F0 include the

tension of the cricothyroid muscle,
subglottal pressure,
vocal fold length,
and thickness of vocal folds.

"

Answer 67

Non-verbal aspects of pitch include:

Pitch accents: Indicate new, given, or contrastive information.
Question/assertion: Rising pitch at the end indicates a question, while a steady drop indicates an assertion.
Tone of voice: Conveys attitudes, emotions, or nuances in the speaker’s intention.
Emotion: Pitch variation can indicate the speaker’s emotional state.

Answer 68

"

Verbal aspects of pitch include:

Lexical stress: Emphasis on a particular syllable within a word can change its meaning (e.g., ""to address"" vs. ""an address"").
Lexical tone: In some languages, pitch variations differentiate between words (e.g., in Mandarin, ""ma"" means mother only with the correct tone).

"

Answer 69

Understanding non-verbal aspects involves comparing them with other linguistic elements, such as

the overall mood of the conversation,
syntactic structure,
meaning,
and grammatical elements,

to gain a holistic picture of the intended meaning.

Answer 70

ELAN is software used to annotate videos by adding explanatory notes or comments. It requires a human annotator and a coding manual, making the process time-consuming and subjective.

Answer 71

"OpenFace is facial recognition software that helps automate the annotation of facial expressions. Its primary focus is on facial expressions, often leaving out other aspects of body language."

Answer 72

"VR and related technologies have gesture trackers that can detect and interpret gestures. However, the interpretation of gestures still often requires human understanding."

Answer 73

"

Muscle tone: Indicates emotional states or reactions.
Distance (proximity between individuals during interaction): Reflects comfort, intimacy, or conversational dynamics.
Facial expressions:
- Eyebrow position: Can indicate surprise, skepticism, or interest.
- Mouth shape: Reflects emotions and verbal articulation.
Gaze/attention: Indicates interest, focus, or distraction.
Fidgeting (small/repetitive movements, often unconsciousness): Reflects discomfort, anxiety, or impatience.

(Note: The list of body language variables is extensive, and researchers do not universally agree on all aspects.)

"

Answer 74

Information structure manifests itself in two ways:

Discourse units: Sentences that belong together are organized into chunks, phrases, and marked by boundaries.
Distinguishing importance: Important information is distinguished from unimportant information (accents,
prominence, emphasis, etc)

Answer 75

"Prominence marking uses various cues to highlight or emphasize specific words or elements in a sentence."

Answer 76

"

Speakers may use pitch accents to signal the importance of words:

Dutch: “Ik voel me SERIEUS genomen” vs. “Ik voel me serieus GENOMEN” (people respect me vs. people don't respect me).
English: “The kids had lunch. The boys/BOYS were eating an apple.” (only boys vs. also girls).
Context: “No, not the RED button, the BLUE button” (emphasizing the contrast).

"

Answer 77

"Speakers signal prominent information through visual cues, such as facial variations.

Rapid eyebrow movements (flashes) can play a similar role as pitch accents in emphasizing important information.

"

Answer 78

There is a close connection between pitch and eyebrow movements, with high/raising notes often synchronized with raised eyebrows.

Answer 79

No, there is no one-to-one mapping, but speakers prefer to synchronize verbal cues with visual cues.

When they align, it enhances clarity and emphasis of the message. When verbal and visual cues don’t align it may create difficulty in communication.

Answer 80

"In newsreaders, there is often alignment between visual and auditory cues for prominent information, especially for strong accents, despite speaker variation."

Answer 81

Experimental data suggest that auditory and visual beats are tightly coordinated.

When a speaker produces a visual beat on a word (gesture), some acoustic properties of that word are affected, and the auditory prominence of that word increases.

Answer 82

Auditory accent: The strongest cue for perceived accent, with high correct identification rates (94.2% for Maarten, 94.9% for Maandag, 85.8% for Mali).
Congruent situations: Received more responses than incongruent ones.
Visual accent: Used when the auditory signal was unclear, indicating reliance on visual information.
Reaction times: Incongruencies led to significantly longer reaction times, indicating confusion.

Answer 83

Congruente stimuli worden sneller verwerkt dan incongruente stimuli.

Met name voor het eerst en derde woord.

Auditieve nadruk, is de sterkste manier van nadruk.
Maar in incongruente situaties werden visuele hints meer belangrijk.
In incongruente situaties, worden met een visuele nadruk leverde meer reacties op, maar er was ook een langere reactietijd.

Answer 84

Top: Rapid eyebrow movements (flashes) may play a similar role as pitch accents.
Mouth area: Articulators make more exaggerated movements when a prominent or important word is produced.

Answer 85

Perceptual: Observers are more sensitive to dynamic variations in the left part of the face than the right.
Acoustic/physical: There is a significant correlation between F0 (pitch) and the left eyebrow. The left side of the face represents the head better than the right side.

Answer 86

Thompson et al. (2004) found that observers are more sensitive to dynamic variation in the left part of the face than the right part.

Answer 87

Closer distances improve recognition accuracy for facial features.
Whole face visibility provides the highest accuracy, followed by eyes and brows, with the mouth area being the hardest to recognize.
The eyes and brows are more easily recognized at a distance than the mouth area.
As distance increases, the ability to correctly identify facial features significantly decreases.

Answer 88

Different facial areas are not equally important for prominence signaling:

Vertical: Top is more important than bottom.
Horizontal: Left part is more important than right part.

Answer 89

Languages differ in terms of their prosody in two main ways:

Prosodic form: Differences in the timing of pitch movements, pitch range, tempo, etc.
Prosodic functions: Differences in the use of pitch rise to mark question intonation, use of accent, etc.

Answer 90

Differences in prosodic form include:

Timing of pitch movements
Pitch range differences
Tempo

Answer 91

Differences in prosodic functions include:

Use of pitch rise to mark question intonation
Use of accent to indicate emphasis

Answer 92

"Chunking in prosody refers to the way speakers group words and phrases into discourse units, making it easier to understand and process spoken language. It involves using prosodic cues like pauses, intonation, and stress to signal the boundaries of these units."

Answer 93

"

Plastic languages are more flexible in moving accents within an utterance, while
non-plastic languages are less flexible.

"

Answer 94

"

Germanic languages (e.g., Dutch, English, German) are generally more flexible (plastic) with accents,
while Romance languages (e.g., French, Italian, Spanish) are less flexible (non-plastic).

"

Answer 95

"n English, football scores are announced with accents that can move within the sentence, e.g., ""Liverpool ONE CHELSEA one."" In Italian, the accent placement is less flexible, and scores are announced with a more fixed pattern, e.g., ""Rome UNO Juventus UNO."" This shows that English can emphasize different parts of the sentence more easily than Italian."

Answer 96

"Languages may use compensatory strategies such as word order to manage accentuation differences."

Answer 97

English is used as a first language (L1) by a large number of people and as a lingua franca (L2) by many who speak different languages as their first language, such as Zulu and various Bantu languages.

Answer 98

20 speakers participated:

10 speakers of L1 English (only English)
10 speakers of L1 Zulu (both Zulu and English)

Answer 99

"Participants described differently colored objects from left to right, focusing on a red cow that appeared in different contexts (contrasting with preceding color or form, and appearing at the end of the list or not)."

Answer 100

"Moedertaalsprekers van Engels in Zuid-Afrika gebruiken intonatie en prosodie anders dan niet-moedertaalsprekers.

Niet-moedertaalsprekers, zoals Zulu-sprekers, gebruiken intonatie voornamelijk om continuïteit of finaliteit aan te geven, terwijl moedertaalsprekers ook intonatie gebruiken om focus en positie binnen een lijst aan te geven."

Answer 101

"Position verwijst naar de vraag of de uitspraak definitief klinkt of niet. Eindzinnen kunnen gemakkelijk worden onderscheiden van niet-eindzinnen in Engels en Zulu."

Answer 102

"In English (native and fluent L2 speakers), contrastive words are marked by emphatic stress. In Zulu and in the English of less proficient L2 speakers, contrastive words are NOT marked by emphatic stress."

Answer 103

"Empathic stress refers to the increased emphasis placed on a specific word within a sentence to highlight its importance or convey emotion. This emphasis is often achieved through changes in pitch, loudness, or duration of the stressed word."

Answer 104

" Contrastive words are words that are emphasized to distinguish them from other words or ideas in the same context. This emphasis helps to clarify differences or contrasts between items, such as in the sentence, ""I said the RED car, not the blue one,"" where ""RED"" is the contrastive word."

Answer 105

Language learning: Educational programs should focus not only on phonology, lexicon, and grammar, but also on intonation and the functional use of accents.
Sociolinguistic implications: If someone does not master the intonational rules of a specific language or uses the rules differently, they will continue to sound different from native speakers.
Goodness of a speaker: The difference between good and bad speakers can be related to the effective use of accents.

Answer 106

Languages can differ in their functional use of prosody, but these differences are related to the kind of function, such as chunking vs. prominence.
The prosodic phenomena of a first language (L1) may transfer to a second language (L2), especially when the L2 speakers are less fluent. This transfer is referred to as prosodic traces.
Such prosodic traces may have sociolinguistic implications, affecting how speakers are perceived and how effectively they communicate in their second language.

Answer 107

" Non-verbal cues naturally produced include facial expressions, gaze patterns, hand gestures, pointing, posture, and distance. They convey emotions, attitudes, and social signals."

Answer 108

"Research involves studying measurements and theory descriptions to understand non-verbal cues."

Answer 109

"Artificial production refers to making symbolic/manual changes (simple) and deep connectionist changes (more complex) in natural speech to see the effects."

Answer 110

Form refers to the cue itself (e.g., eyebrow raise, pitch raise/contour) that makes the voice or body language stand out and more noticeable.

Answer 111

"Meaning refers to the function or purpose of the form, particularly how they contribute to prominence and mark specific information."

Answer 112

"

Focus refers to a part of the sentence that has new and prominent information. For example, in ""Mark is the expert on deception,"" the focus could be on ""Mark"" to highlight him as the expert among alternatives.

"

Answer 113

Who was the expert on deception? Mark is the expert on deception.

Mark is the expert on deception (as opposed to another field)

Answer 114

" Link refers to information that is not new but is given prominence to highlight common ground.

For example, What about Marc? Marc is the expert on deception (Mark is what we all know, the

rest of the information (“is the expert on deception”) is new and thus the focus)

"

Answer 115

"

Tail refers to information that is not new and not prominent, often used for grammatical completeness. For example, in responding with a complete sentence: ""Yes, I think Marc is the expert on deception.""

"

Answer 116

Focus is emphasized by placing a pitch peak (H) on the stressed syllable, making it stand out or sound more important due to the higher pitch.

Answer 117

A Link involves a pitch pattern where the pitch lowers (stretch) and then rises (Low*High). This pattern helps to transition smoothly between syllables, creating a distinctive rhythm and melody in speech.

Answer 118

"Speech rate slows down on prominent parts to emphasize or give importance to specific elements in speech."

Answer 119

"Taalhandelingeen.

Speech acts are actions performed through speech, such as making statements, asking questions, giving commands, and expressing feelings. They go beyond conveying information to include influencing others and expressing emotions."

Answer 120

"Understanding speech acts helps us interpret the intentions behind what people say and how language shapes our interactions. It reveals the purpose of communication beyond just conveying facts."

Answer 121

"

unlike straightforward statements of fact, speech acts are not easily categorized

as true or false. They are more about the performance of an action or the expression of an

intention

e.g. ""Kun jij niet de suiker aangeven?"", betekent meestal niet dat iemand niet de suiker aan kan geven.

"

Answer 122

"The main components of prosody include pitch rise (component 1) and intensity and pitch (component 2).

These elements help convey emotional nuances and emphasis in speech."

Answer 123

The ability to say things involves understanding the performative nature of speech acts, analyzing prosody components (pitch rise, intensity, and pitch), and recognizing the intent or emotional valence behind expressions such as mockery, disbelief, and various emotions.

Answer 124

"Information structure manifests itself in two ways:

by distinguishing important information from unimportant information (accents, prominence, emphasis)
by grouping sentences that ""belong together"" into discourse units (chunking, phrasing, boundary marking).

"

Answer 125

"Distinguishing important information involves using accents, prominence, and emphasis."

Answer 126

"Grouping sentences into discourse units involves chunking, phrasing, and boundary marking."

Answer 127

"Boundary marking is the practice of speakers marking the end of information units, such as a sentence, phrase, or turn, to indicate a boundary in speech."

Answer 128

"Visual cues such as punctuation (e.g., full stops, commas), indentation, line breaks, and capitalized words at the beginning of a sentence help visualize the structure of a text and facilitate the reading process."

Answer 129

Local cues: Encoded at the very edge of a speech unit
Global cues: Stretched over a whole unit
Global cues allow prediction of upcoming boundaries
Compare with turn-taking: Turn-switches often proceed smoothly without much overlap or delay due to the predictive capacity of prosody

Answer 130

Intonation (boundary tones, declination)
Pitch reset
Durational lengthening (final word)
Pauses (silent or filled pauses)
Voice quality (creaky voice)

Answer 131

"compare it with a mathematical formula:

2 + (3x5) means something different than (2+3) x 5.

“The man said: the girl is ill” vs “The man, said the girl, is ill”

2 different meanings.

“The man said: the girl is ill” vs “The man, said the girl, is ill”

"

Answer 132

"The Peps-C programme provides teaching materials to help children learn how to produce or interpret prosody, including skills like chunking."

Answer 133

"

The sentence ""Chicken fingers and fries"" illustrates how prosodic chunking can affect interpretation. Chunking helps distinguish between the intended meanings, such as (a) food items like chicken fingers and fries or (b) a literal combination of chickens, fingers, and fries.

"

Answer 134

Gaze behaviour (Argyle & Cook, 1976)
Body posture (Cassell et al. 2001)
Head nods (during feedback signalling) (Maynard 1987)
Eyebrow movements (Cave et al. 1999

Answer 135

The reaction time experiment suggests that combining different information sources (such as audio and visual signals) is beneficial when these sources complement each other. When the information from both sources works well together, people can respond faster because the information is easier to process.

Answer 136

" When information sources do not complement each other, it can lead to cognitive overload. This means that the brain has to process too much information at once, making it harder to respond quickly and efficiently. As a result, the reaction time becomes longer, and it takes more mental effort to understand the information."

Answer 137

"he general task in the reaction time experiment was to ""press a designated button as soon as the end of the stimulus is reached."" This was applied in both the actual experiment with real audiovisual recordings and a baseline condition with stimuli of variable lengths without finality cues."

Answer 138

"The conditions compared in the reaction time experiment were audiovisual (AV), audio-only (AO), and vision-only (VO). The reaction times were measured in these different modalities to understand the impact of combining audio and visual information."

Answer 139

The main findings from the reaction time experiment indicated that AV stimuli were the quickest in the actual experiment (with real audiovisual recordings) , while in the baseline condition, AV stimuli were the slowest. This suggests that combining modalities helps when the information sources are complementary, but leads to cognitive overload when they are not .

Answer 140

Participants had to judge for both short and long utterances whether a fragment was final or not.

Answer 141

"Observers could make the best end-of-utterance classifications for bimodal stimuli; interestingly, the lowest scores were for audio-only (AO) stimuli, despite receiving a lot of attention in the literature."

Answer 142

"participants had too choose if a fragment was final or not.

Non-final fragments were easier than final fragments. People may be looking for marked features; if these are absent, they choose a default, non-final classification."

Answer 143

Longer fragments were easier than shorter fragments, possibly due to longer exposure to cues.

Answer 144

"he difference in performance between short and long stimuli was bigger for audio-only (AO) stimuli than for vision-only (VO) stimuli. Results for short and long stimuli were very similar in vision-only conditions."

Answer 145

The existence of more global auditory cues (such as declination), whereas visual cues are more locally encoded.

Answer 146

People take turns: while person A is producing speech, person B remains silent until it is his/her turn to start talking.

Answer 147

"The switch between speakers is regulated through a turn-taking mechanism.

"

Answer 148

"Smooth interaction involves switching turns smoothly, with minimal overlap in speech and only a few milliseconds delay between turns."

Answer 149

"They rely on specific cues that can be lexical, syntactic, auditory, or visual."

Answer 150

"

True turns involve active contributions with substantial information, while
minor backchannels are minimal responses indicating engagement, like nodding or saying ""uhuh"".

"

Answer 151

"

Backchannel opportunity points can be predicted to some extent by identifying specific cues in the conversation that indicate a speaker's willingness to listen or their need for a response.

"

Answer 152

The specific issues include:

Variation between individuals: How much individuals differ in their use of backchannels.
Implementation in synthetic characters (avatars): Whether this behavior can be effectively programmed into avatars to mimic natural human interactions.

Answer 153

"The implementation can lead to an improved naturalness of computer systems, making interactions feel more human-like and intuitive."

Answer 154

Their research is based on the o-cam paradigm, which involves participants interacting with what they believe is a live person but is actually a pre-recorded session to study backchannel behavior.

Answer 155

"he O-cam paradigm involves participants interacting via an online session (like Skype or Zoom) where they believe they are seeing a live person. However, they are actually viewing a recording of a confederate.

This illusion is created through a scripted introduction, and the participants' task is to guess which of four similar tangram figures the other person is describing."

Answer 156

"The O-cam paradigm experiment involved 14 participants who believed they were in a live interaction. They played several rounds, resulting in 6 minutes and 15 seconds of interaction each. The study identified 53 Backchannel Opportunity Points (BOPs) via 10 observers. It was evident that participants varied in their feedback behaviors."

Answer 157

Participants were rated on perceived personality traits (Friendliness, Extraversion, Activeness, Dominance) using 6-point scales. Their behaviors were analyzed based on auditory and visual features, which significantly correlated with personality impressions.

Answer 158

"The study found that behavioral measures, which included auditory and visual features, correlated significantly with perceived personality traits. These measures appeared to be strongly related to impressions of personality, such as Friendliness, Extraversion, Activeness, and Dominance."

Answer 159

"The behaviors of human subjects were implemented into an animated character, including both visual and auditory features. A second experiment revealed that different feedback behaviors led to different impressions of the avatar's personality."

Answer 160

" Implementing human behaviors into animated characters can generate different personalities for machines, aid in developing user-specific adaptive systems, help train communicatively deprived individuals

(e.g., people with autism or blind people), and improve ""rapport"" between conversation partners through effective feedback signaling."

Answer 161

"Parallel Wavenet directly models the raw audio signal by predicting one sample at a time, conditioned on the previous samples and relevant context."

Answer 162

"It can produce highly realistic and natural-sounding speech and is successful in capturing the nuances of the human voice and generating high-fidelity audio."

Answer 163

" Speech synthesis is the artificial production of human speech. It converts written text into spoken words using computer algorithms. This technology is used in various applications, such as virtual assistants, navigation systems, and accessibility tools for visually impaired individuals."

Answer 164

"In 2024, Wavenet continued to be used, but Tacotron 2.0 also became prominent, showcasing the dynamic nature of advancements in the field."

Answer 165

"Tacotron 2.0 consists of an encoder and a decoder. The encoder processes the input text and converts it into a fixed-size context vector (which is a numerical representation of the text.), while the decoder generates mel-spectrograms representing the speech features."

Answer 166

"Tacotron 2.0 provides a holistic approach to speech synthesis, allowing for direct modeling of the text-to-speech conversion process. It enables flexibility in controlling various aspects of speech synthesis, such as prosody and speaking style."

Answer 167

Parallel Wavenet excels in producing natural speech but may have limited control over specific aspects like prosody and speaking style.

Answer 168

"Mel-spectrograms are representations of speech features used by the decoder in Tacotron 2.0 to generate the sound output."

Answer 169

"The trend is towards end-to-end models. These models are trained to predict the next part of speech from the given speech. This makes the models good enough to allow for fine-tuning."

Answer 170

"Fine-tuning involves using the hidden knowledge from a pre-trained model to learn related tasks. This process takes advantage of the hidden representations (black box) within the model.

By adjusting the model’s parameters for a specific task or dataset, fine-tuning allows the model to adapt efficiently and transfer its knowledge to new domains or applications like entertainment or education."

Answer 171

"Text-to-Speech (TTS) is the process of creating artificial speech from written text. It aims to produce the best match between the written words and the spoken output."

Answer 172

"

We use a loss function. Think of it like a scorekeeper. It measures how close the generated speech is to what we want.
Example: If you want the speech to say ""Hello"" cheerfully and it says ""Hello"" sadly, the score will be high (bad match). If it says ""Hello"" cheerfully, the score will be low (good match).

"

Answer 173

Non-verbal cues are things like intonation (voice rise and fall) and emotion (happy, sad, etc.). Including these in our scoring helps make the speech sound more natural.

Answer 174

"The randomness in the meanings of non-verbal cues is due to the way the technology generates these cues based on the training material received from humans. The system tries to reproduce what it has learned, but it doesn't always know where to mark the cues accurately, which can affect the naturalness of the speech."

Answer 175

"The focus action marks that something is new to the listener. If the focus is on the right mark, the speech sounds natural; otherwise, it sounds robotic."

Answer 176

"WaveNet includes intonation, accents, emotion, and other vital layers of communication to deliver a richness and depth to computer-generated voices that earlier systems overlooked."

Answer 177

"Tom Lentz doubts that a system can generate a speaker's affective state or common ground/shared understanding between speaker and listener with only text. Non-verbal cues necessary for affective states are limited in text, and information about common ground may be less explicit."

Answer 178

Choice of words: Systems can use specific words to express emotions.
Previous conversation: Utilizing information structure from previous interactions can help improve naturalness.

Answer 179

Can people perceive empathic behavior from a robot when only the emotions in its speech are used to express empathy?
Do people prefer an empathetic voice from robots or a non-empathetic robotic voice?
What factors of speech can be related to an empathetic voice?

Answer 180

"The uncanny valley effect occurs when humanoid objects appear almost, but not exactly, like real humans, eliciting negative reactions."

Answer 181

"The method involved an actor varying only prosody (intonation, rhythm, and stress) while speaking through a healthbot and a human speaking, both not visible to the participant."

Answer 182

"Yes, users preferred an empathetic voice from robots and were able to perceive empathic behavior when only the emotions in the robot's speech were used to express empathy."

Answer 183

"

Users recognized additional emotional nuances such as empathy, concern, and encouragement in the robot's voice.

These factors contributed to their preference for an empathetic voice. Conversely, individuals tended to avoid choosing a robotic voice that lacked emotions and exhibited monotony.

"

Answer 184

The results suggest that emotional expressiveness and variation in the voice are crucial for user acceptance and preference.

Answer 185

"The method involved manipulating stress (as in ""I am stressed""), which could be present, absent, or copied from the participant. The manipulation was done manually by adding wavering in pitch."

Answer 186

"

Stress present: The speech included stress cues.
Stress absent: The speech did not include stress cues.
Stress copied from participant: The speech mirrored the stress cues present in the participant's voice.

"

Answer 187

"

No significant change in stress: The study found no significant change in stress levels between the conditions (presence, absence, or mirrored affect). This lack of significance might be due to limitations in the study, such as a potential power issue or other influencing factors.
Significant effect in task success: Participants' performance on a shared task was influenced by the presence of their own or mirrored affect. This implies that emotional expression, even if not consciously perceived, had an impact on task success. The emotional cues present in participants' speech affected their interaction and collaboration, leading to differences in performance outcomes.

"

Answer 188

"

Vall-E is a model developed by OpenAI designed to mimic any given voice, including its emotional nuances. It can speak in any voice (including its emotion) if given a 3-second example of the desired speaker's voice. The key elements of the training are:

Ground truth: The human speaking target (what the speech should sound like).
Baseline: A simple text-to-speech model that lacks the ability to capture the nuances and subtleties of human speech, especially in terms of emotion.
Prompt: A 3-second example of the desired speaker’s voice, serving as the training input for the model to learn and replicate the specific voice and emotion.

"

Answer 189

"

While Vall-E is designed to mimic any voice, including its emotional nuances, using only a 3-second example, the effectiveness of this method depends on the complexity of the emotions and the range of nuances in the target voice. The system may capture the general characteristics and some emotional aspects, but it might not fully replicate more complex or subtle emotional nuances that require a deeper understanding of the context and prolonged exposure to the speaker's voice.

"

Answer 190

"No, actors typically break down emotions into various components. For example, portraying doubt and despair involves understanding and expressing specific elements such as tone, pitch, pacing, and emphasis. It's a nuanced process that goes beyond merely copying a single emotion in its entirety."

Answer 191

" In emotional speech, there can be some predictability in terms of focus, tail, and common ground. Certain emotions may lead to recognizable patterns in speech, like changes in pacing, pitch, or emphasis. However, this predictability is not universal and can vary based on individual differences and contextual factors."

Answer 192

Vall-E, developed by OpenAI, can replicate and generate emotions in speech to some extent. The model is trained to mimic the emotional nuances present in a provided 3-second example of a speaker. However, the complexity of emotions and their contextual nature may pose challenges in generating highly nuanced or context-specific emotional expressions.

Answer 193

"No, Vall-E, being a text-to-speech model, focuses on generating spoken content and does not have the capability to incorporate or mimic body language. Body language involves visual cues such as gestures, facial expressions, and postures, which fall outside the scope of Vall-E's capabilities."

Answer 194

"Vall-E primarily focuses on replicating vocal aspects, including intonations and emotional cues in speech. However, it does not encompass other non-verbal speech cues like pauses, hesitations, or changes in rhythm, which also contribute significantly to effective communication. The model is limited to the auditory domain and doesn't account for the full range of non-verbal cues present in human communication."

Answer 195

"

Metacognition is the ability of people to think about their own thinking. It refers to a person's beliefs and knowledge about their own cognitive processes.

"

Answer 196

"

Understanding another person's mental state:
- Studied in the context of theory of mind.
- Concerned with developmental or pathological aspects of metacognition.
- The ability to understand that other people have beliefs, desires, intentions, and perspectives that may differ from one’s own.
- Example: Sally-Anne test (assessing the ability to look inside another person’s head).
Understanding your own mental state:
- Awareness of one’s own cognitive processes, such as memory, attention, problem-solving strategies, and emotional states.

"

Answer 197

The Sally-Anne test assesses theory of mind by testing if a child understands that others can have false beliefs. Children are asked where Sally will look for a ball after it has been moved while she is absent. Those who have developed a theory of mind understand that Sally will look for the ball where she last left it, not where it actually is.

Answer 198

"The Tip of the Tongue (TOT) phenomenon is the feeling of being unable to recall a specific word or piece of information, even though you know it is stored in your memory and feels like it’s just on the brink of being retrieved."

Answer 199

Differences in confidence levels are reflected in the way speakers present themselves, which is useful for their addressees.

For the speaker:

It serves as a face-saving strategy (not appearing ridiculous if wrong).

For the addressee:

It manages expectations and can make them more prone to asking again or asking someone else.

Answer 200

"

Linguistic hedges: Phrases like ""I am not sure, but..."" or ""I think...""

Filled pauses: Words like ""uh"" and ""uhm""

Prosody: Using question intonation

"

Answer 201

Visual cues include:

Body language
Facial expressions
Gestures

These cues are natural and important ingredients of daily conversations as well.

Answer 202

"

The flowchart illustrates the process of answering a question with varying degrees of certainty:

A question is asked: ""What is the capital of Switzerland?""
Feeling of Knowing?
- If yes, proceed to search memory (LTM).
- If no, the answer is ""I don't know.""
Willing to Search Longer?
- If yes, continue searching.
- If no, the answer is ""I don't know.""
Answer Found?
- If yes, check confidence level.
- If no, continue searching if willing.
Sufficiently Confident?
- If yes, provide the answer (""That's Zurich."").
- If no, continue searching or decide ""I don't know.""

The flowchart shows how individuals navigate between certainty, uncertainty, and metacognitive awareness when answering questions.

"

Answer 203

"Answering factual questions using tests like WISC, WAISC, and Trivial Pursuit."

Answer 204

OK-scores are subjective ratings about how confident individuals are in their ability to recognize the correct answer to a question if presented later.

Answer 205

Participants take a multiple-choice test to recognize the correct answers, particularly those they were uncertain about initially.

Answer 206

"

It refers to cases where individuals say ""I don't know"" in the first stage but have a high FOK in the second stage.

"

Answer 207

These cues are associated with significantly lower FOK scores indicating higher uncertainty.

Answer 208

"There is a negative correlation; the more words people produce, the less sure they are."

Answer 209

" Adults are more expressive with facial expressions and justifications for their answers, while children are less expressive and more likely to remain silent."

Answer 210

"Eyebrow movements, smiling, and gaze patterns are visual cues indicating uncertainty."

Answer 211

" For adults, smiling correlates with higher FOK scores (embarrassment), while for children, it indicates pride in knowing the answer."

Answer 212

Speakers use various audiovisual cues to express uncertainty, with adults doing so more than children due to better self-presentation skills.

Answer 213

Children had fewer high FOK scores for non-answers and were generally less expressive than adults.

Answer 214

"

Children don't have the social skill to do facial expressions when they don’t know the answer, unlike adults.

"

Answer 215

Adults justify their silence, while children just stay silent.

Answer 216

"High intonation, filled pauses, delay, and using more words."

Answer 217

Eyebrow movement,
smile,
“funny face”,
and gaze (looking away from the questioner).

Answer 218

"Adults' non-answers with filled pauses, delay, high intonation, etc., correspond with a significantly higher FOK score, while children's do not have such significant patterns."

Answer 219

Speakers express their level of uncertainty via various audiovisual cues, with adults doing this much more than children.

Answer 220

Feeling of Another’s Knowing.

Answer 221

"Observers can estimate a speaker’s level of uncertainty based on audiovisual cues."

Answer 222

"Answers are ""easier"" to estimate uncertainty than non-answers."

Answer 223

Scores for unimodal stimuli (sound only and vision only) are good, but scores for bimodal stimuli (both sound and vision) are the best.

Answer 224

The task for children vs. adults was to judge the level of (un)certainty.

Answer 225

Children found it very difficult to judge other children on certainty but found it easier to judge adults.

Answer 226

"Adults found it way easier, but found it easier to interpret for other adults than for children."

Answer 227

"Adults are better judges than children."

Answer 228

" Adults are better judged than children because adults signal their certainty more clearly."

Answer 229

"Answers (1 certain, 1 uncertain) from 5 speakers were selected; words had to have a similar sound shape."

Answer 230

" Sound and image were separated to create combinations of certain and uncertain settings for three variables: filler (absent, present), high intonation (absent, present), and marked facial expression (absent, present)."

Answer 231

The combinations were:

Face sure, voice unsure
Face sure, voice sure
Face unsure, voice unsure
Face unsure, voice sure

Answer 232

"A total of 40 stimuli were created."

Answer 233

Both original and mixed stimuli were presented to 120 subjects who rated the speaker’s confidence level on a 7-point scale.

Answer 234

"Eight different experiments were conducted to ensure subjects would not see the same speaker within one test."

Answer 235

Presence of a filler led to a systematic increase in perception of confidence level.
Stimuli with high intonation were perceived as more uncertain.
Stimuli with marked facial expressions were perceived as more uncertain.

Answer 236

This study aligns with previous work on emotion perception showing the predominance of visual information (e.g., Mehrabian and Ferris).

Answer 237

"

A filler is a sound or word used to fill pauses in speech, often indicating hesitation or uncertainty. Common examples of fillers include ""um,"" ""uh,"" ""like,"" and ""you know.""

"

Answer 238

Yes, cultures can differ in the way they produce cues to uncertainty and in how such cues are interpreted by observers.

Answer 239

"The study compared Dutch speakers with Japanese speakers."

Answer 240

Japanese speakers are interesting because they are often considered to be rather unemotional and have a tendency to avoid uncertainty more than Western cultures.

Answer 241

" Subjects rated their certainty on a scale from 0 (not sure) to 7 (very sure), with the task being the same for both Dutch and Japanese adults."

Answer 242

"Randomly selected answers with low-FOK and high-FOK ratings from 8 Dutch and 8 Japanese speakers were presented to Dutch and Japanese observers. The task was to rate the speaker’s certainty on a 7-point scale."

Answer 243

"88 raters participated, with 44 Dutch and 44 Japanese, equally balanced across gender."

Answer 244

It was easier to judge Dutch speakers’ certain/uncertain answers than Japanese speakers’.
It was easier to judge females than males regarding their confidence levels.
There was no significant in-group effect observed.

Answer 245

Difference between certain and uncertain answers was easier to judge for Dutch than for Japanese speakers.
It was easier to judge female speakers than male speakers on their confidence levels.

Answer 246

The study concluded that it is generally easier to judge uncertainty for Dutch speakers compared to Japanese speakers, and female speakers provide clearer cues of confidence than male speakers, regardless of culture. There was no in-group bias (scoring people higher from their own culture)) observed in the study

Answer 247

"

Yes, there are reasons to assume differences in facial cues to prominence:

Upper vs. Lower Part of the Face: The upper part of the face, especially the eyebrows, is often used to signal prominence. Rapid eyebrow movements (flashes) can play a similar role to pitch accents in speech, signaling emphasis or importance. The lower part of the face, such as the mouth, can also indicate prominence but is more often associated with emotional expressions.
Left vs. Right Side of the Face: Observers are more sensitive to dynamic variations in the left part of the face compared to the right. This could be because the left side of the face (from the observer's perspective) is more expressive and connected to the right hemisphere of the brain, which is involved in processing emotions. Studies have shown a significant correlation between pitch and left eyebrow movements, indicating a stronger connection between auditory and visual cues on the left side of the face.

"

Answer 248

"

Prosodic Differences in English (Germanic Language):

Phrase 1: ""Move the object from A2 to A3"":
- Intonation: Relatively flat with a slight rise on ""A3"" to indicate the end of the instruction.
- Stress: Slight emphasis on ""A3"" to mark the final destination.
Phrase 2: ""Move the object from A2 to B3"":

Intonation: More noticeable rise on ""B3"" to emphasize the different destination and direction.
Stress: Contrastive stress on ""B"" in ""B3"" to differentiate it from ""A3.""

Prosodic Differences in French (Romance Language):

Intonation: Generally smoother and less variable than in English. A slight rise on ""A3"" to indicate the end.
Stress: French does not use stress for contrast as strongly as English. The phrase would likely have a more even stress pattern.

"

Answer 249

Communication via spoken language is not an exact data transfer process; many things can go wrong because:

Speakers may experience problems expressing themselves.
Addressees may not fully understand what a speaker is saying.
Spoken language is a very evanescent phenomenon (speech is immediately gone).

Answer 250

The process of grounding information typically proceeds in two phases:

Presentation Phase: The current speaker sends a message to their communication partner.
Acceptance Phase: The receiver signals whether the message was understood correctly or not.

Answer 251

Conversants circumvent the infinite loop by signaling that they received the feedback correctly, and this signaling cycle continues in a manageable way (Clark, Traum).

Answer 252

Communication partners negotiate information through continuous signals on the status of the information being exchanged, similar to teamwork in activities like dancing or playing chess.

Answer 253

"

There are two main types of feedback cues:

Positive Feedback Cues: Signals like ""go on"" indicating that there are no problems with the information being exchanged.
Negative Feedback Cues: Signals like ""go back"" indicating that there are problems with the information being exchanged.

"

Answer 254

"

It is more essential to detect go-back signals (indicating a ""conflict"") because the consequences of ignoring these signals can be significant, leading to larger-scale conversation problems.

This is similar to the traffic light metaphor where not stopping at a red light (conflict) is more critical than not following a green light (confirmation).

"

Answer 255

The expectation is that negative feedback cues are more marked and stand out more compared to positive cues, similar to how a red light stands out more due to the potential consequences of ignoring it.

Answer 256

The question is how prosodic and non-verbal cues compare to lexico-syntactic cues in indicating positive and negative feedback.

Answer 257

"It is speaking in an exaggerated manner, typically with a slower tempo, louder voice, higher pitch, and more pauses, often used in problematic dialogues."

Answer 258

"In child-directed speech, speech over long distances, and Lombard speech (e.g., speaking louder in a noisy environment)."

Answer 259

"To examine how people use negations in Dutch to signal problematic dialogue contexts."

Answer 260

Two speaker-independent spoken dialogue systems that provided train timetable information.

Answer 261

"

Responses to ""Do you want me to repeat the information?"" (go on) and ""Do you want to travel to Amsterdam?"" (go back).

"

Answer 262

There were 20 participants interacting in 120 dialogues in total.

Answer 263

"A situation where there is misunderstanding or communication issues, causing the speaker to elaborate more to clarify the problem."

Answer 264

"A situation where the conversation proceeds smoothly without misunderstandings, requiring less elaboration."

Answer 265

Problematic cases had more elaborate responses with additional information to clarify issues in the conversation.

Answer 266

"he two types are: ""Do you want me to repeat the information?"" (go on) and ""Do you want to travel to Amsterdam?"" (go back)"

Answer 267

"The distribution is analyzed into three types: single no, no with additional information (stuff), and more detailed no responses."

Answer 268

"Mensen praten langzamer, luider, en met meer pauzes."

Answer 269

Praten met kinderen, schreeuwen naar iemand ver weg, of spreken in een lawaaierige omgeving.

Answer 270

Het wordt langzamer en langer uitgesproken.

Answer 271

"

Soms alleen ""nee"", soms ""nee"" met extra woorden zoals ""Amsterdam"".

"

Answer 272

Ze voegen meer details toe.

Answer 273

"Ze moesten beoordelen of ""nee""-uitingen uit probleemgesprekken kwamen.
"

Answer 274

"Ze konden dit goed beoordelen, ver boven toevalsniveau."

Answer 275

"

Mensen zeggen ""nee dankjewel"" in één vloeiende zin zonder pauze.

"

Answer 276

"

Mensen zeggen ""nee"" en daarna extra woorden met pauzes ertussen.

"

Answer 277

"De pauze na ""nee"" is langer."

Answer 278

De toonhoogte (F0) van de extra woorden is hoger in probleemgesprekken.

Answer 279

Ga terug (Go back) Antwoorden: Deze antwoorden geven aan dat er iets moet worden aangepast of heroverwogen. Ze bevatten vaak meer informatie om de situatie te verduidelijken en om duidelijk te maken dat er een probleem is of dat er iets verkeerd is begrepen.
Ga door (Go forward) Antwoorden: Deze antwoorden geven aan dat alles in orde is en dat het gesprek kan doorgaan zonder aanpassingen. Ze zijn vaak korter en bevestigen dat er geen probleem is

Answer 280

"Echoic responses are when people often repeat each other’s words or phrases during conversations."

Answer 281

"The two types of echoic responses are priming behavior and conventionalized behavior."

Answer 282

"Priming behavior is when people unconsciously copy each other’s expressions. For example, if one person uses a specific word or phrase, the other person might repeat it without realizing it."

Answer 283

"Conventionalized behavior refers to standard actions or expressions commonly used in social interactions, such as greetings like bowing, kissing, or hugging. Mimicking these behaviors follows social norms and expectations."

Answer 284

"Repeating words or phrases can indicate feedback, showing whether the speaker wants to continue (go-on) or clarify something (go-back)."

Answer 285

"

A: ""and then you transfer to the Keage line...""
B: ""Keage line""
A: ""which will bring you to Kyoto station""

"

Answer 286

"A: ""and that is the Keage line...""
B: ""Keage line?""
A: ""that’s right, Keage line"""

Answer 287

"One student instructed another on how to build a specific construction using building blocks, with the goal of making it as similar as possible to a picture only the instructor could see."

Answer 288

Negative feedback cues were more likely to be

higher in pitch,
slower in tempo,
and produced after a longer delay.

Answer 289

" Prosodic features can help manage interaction by signaling whether the speaker wants to continue (go-on) or go back and clarify something (go-back)."

Answer 290

"The Japanese results are in line with the Dutch data, showing that speakers tend to make prosodic differences between go-on and go-back signals."

Answer 291

"The consistency suggests that these patterns of using prosodic features to manage interaction may be a general characteristic of human communication."

Answer 292

" People sometimes experience problems with a system (car, telephone, computer, radio) because it was not designed in an appropriate way.

A system operates badly if it does not take into account the failings of the human cognitive system and human skills.

"

Answer 293

" A good design principle is to “design for error”, considering limitations in attention, consciousness, real-life experiences, and ergonomics."

Answer 294

"Spoken dialogue systems are systems with which humans are supposed to interact in natural (spoken) language."

Answer 295

SDS often face problems because they are not yet designed to handle the full range of human linguistic skills.

Answer 296

SDS are typically trained with normal speech, not accounting for variations like fast, slow speech, or repetition.

Answer 297

"Errors will remain a problem due to noisy conditions, interactions with non-native speakers, or an expanded domain of the system."

Answer 298

The three main tasks are to

Prevent errors,
Detect errors,
Correct errors.

Answer 299

ystems can prevent errors by using optimal dialogue strategies.

Answer 300

Systems can detect errors using acoustic and semantic confidence scores.

Answer 301

Systems can correct errors by using feedback cues and system prompts.

Answer 302

"9 subjects were engaged in telephone conversations with a speaker-independent train timetable information system."

Answer 303

They had to query the system on 7 train journeys, resulting in 63 interactions.

Answer 304

" Subjects were video-taped and led to believe the data collection was for developing a new video-phone"

Answer 305

76% of the dialogues were successfully completed.

Answer 306

Hoe goed kunnen proefpersonen problematische en niet-problematische fragmenten onderscheiden in human-machin-intractions op basis van video-opnames?

Answer 307

" In dit specifieke onderzoek werden de videoclips zorgvuldig geselecteerd om ""minimal pairs"" te vormen, waarbij elk paar vergelijkbare uitingen bevatte die plaatsvonden in een problematische en een niet-problematische dialooguitwisseling. De proefpersonen kregen de taak om te raden of de gepresenteerde clip afkomstig was uit een problematische of een niet-problematische context"

Answer 308

"Proefpersonen kunnen problematische van niet-problematische interacties onderscheiden boven kansniveau door gebruik te maken van audiovisuele cues zoals hyperarticulatie en visuele signalen."

Answer 309

"

Het type fragment (problematisch of niet-problematisch),
het niveau van hyperarticulatie, en
de aanwezigheid van visuele cues zoals glimlach, hoofdbeweging, afgewende blik, fronsen, en wenkbrauwheffen.

"

Answer 310

"

Proefpersonen bekeken videoclips van menselijke-machine interacties en moesten bepalen of elk fragment problematisch of niet-problematisch was. De experimenten waren onderverdeeld in drie typen:

Verificatievragen: Proefpersonen zagen gebruikers luisteren naar verificatievragen van het systeem (gebruikers zijn stil), wat probleemloos (juist) of problematisch (fout) kon zijn. Ja (op/neer) en nee (links/rechts) zijn bijna aangeboren, bekend vanaf jonge leeftijd.
Bestemmingsuitingen: Proefpersonen zagen sprekers een bestemming uiten; dit kon de eerste poging van de spreker zijn (probleemloos) of een correctie als reactie op een verificatievraag over verkeerd herkende of begrepen informatie.
Negaties: Proefpersonen zagen sprekers een negatie (""nee"") uiten, wat een reactie kon zijn op een algemene ja-nee vraag of een reactie op een verificatievraag met onjuiste informatie.

"

Answer 311

"de mate van hyperarticulatie en verschilllend visuele ""cues"" zijn positief gecorreleerd met hoe goed mensen onderscheid kunnne makeeen."

Answer 312

"In dit specifieke onderzoek betekent dit het gebruik van dynamische variaties in de stem en gezichtsuitdrukkingen van gebruikers om te detecteren wanneer een interactie mogelijk problematisch is. Deze informatie kan vervolgens worden gebruikt om het dialoogsysteem te verbeteren door vroegtijdig problemen te signaleren en erop te reageren, wat de algehele gebruikservaring verbetert."

Answer 313

"Omdat een spoken dialogue system daarop in kan spelen voordat de dialoog uberhaupt begonnen is."

Answer 314

"Audiovisual prosody is commonly believed to reveal a speaker's emotions (e.g., negative vs positive)."

Answer 315

"Children express their emotions more openly than adults. But it is dependent on temper and family background."

Answer 316

"As a child grows older, they become less expressive due to internalization and learn to manipulate their expressions due to emotion regulation."

Answer 317

"Participants must guess whether each next undisclosed card contains a higher or lower number."

Answer 318

"Making ""rational choices"" implies 3 winning and 3 losing games."

Answer 319

" The game was done with pairs of children: 24 younger children (8-year-old) and 24 older ones (12-year-old)."

Answer 320

"Observers had to determine for each pair of children whether they had just won or lost a game."

Answer 321

"8-year-old children were more expressive when losing a game, while 12-year-old children were less expressive about winning or losing."

Answer 322

"Pakistani children were overall more expressive than Dutch children, with winning being more visible than losing in Pakistani children. Different conventions were used to show happy or sad reactions."

Answer 323

" Children were less expressive when alone."

Answer 324

"children lss xpressive when being alon"

Answer 325

Can children interact and collaborate with a robot in a social and intuitive way, and how similar is this to their interactions with peers?

Answer 326

"256 children (Dutch and Pakistani) participated in one of three conditions: alone, with iCat, or with a friend.
"

Answer 327

Children had the most fun playing with their peers, the least fun playing alone, and playing with iCat was in between.

Answer 328

The setup involved video-mediated interaction where children could either have mutual eye-contact or not. They were always in different rooms but could see each other through a screen.

As a control condition there was also research for children in the same room (co-presence)

Answer 329

Children reported having the most fun in the mediated mutual gaze condition, followed by the co-presence condition, and the least fun in the no gaze condition.

Answer 330

"Mutual eye-gaze has important effects on perceived social presence, game experience, and player behaviors, even if the eye-gaze is not perfect."

Answer 331

"Alleged gender differences include the experience, expression, and perception of emotions.

It has often been claimed that women are more emotional than men.

The debate is whether such differences are real and, if so, whether they are related to biological and/or socio-cultural factors."

Answer 332

" Previous investigations have generally been limited and based on stimuli with limited ecological validity, such as still images rather than moving images and acted emotions rather than naturally induced emotions."

Answer 333

"Few studies have tried to combine multiple perspectives (experience, expression, and perception) into one integrated approach."

Answer 334

- Velten method(1968)
- Film (e.g., Gross and Levenson 1995)
- Music (e.g., Sutherland et al. 1982)
- Feedback/Social Interaction (e.g., Staudel and Paetzold 1984, Yinon and Landau 1987)
- Gift (e.g., Isen et al. 1987)
- Facial expression (e.g., Leventhal 1980)

Answer 335

"The Velten method, developed by M. E. Velten in 1968, is a mood induction procedure used in psychological research. It involves reading a series of self-referent statements designed to elicit a specific mood. Participants read these statements aloud or silently to induce positive, negative, or neutral emotional states. For example, positive statements might include phrases like ""I feel good about myself,"" while negative statements might include ""I feel very down."""

Answer 336

"The meta-analysis was based on 250 studies from 22 international journals and evaluated the effectiveness of different Mood Induction Procedures (MIPs)."

Answer 337

- Film [r = 0.738]
- Feedback [r = 0.494]
- Velten [r = 0.467]
- Gift [r = 0.378]
- Music [r = 0.360]
- Facial expressions [r = 0.122]

Answer 338

The two parts are:

Production study
Perception study

Answer 339

"There were 33 participants (16 males, 17 females) with moods being depressed and elated (positive and negative valency)."

Answer 340

"The mood induction procedure involved 7-minute film fragments."

Answer 341

The purpose was to study the influence of mood on solving dilemmas.

Answer 342

It is your turn to order at the bakery, but someone else goes before you. What do you do?
- A: You don’t say a thing, since you have all the time in the world, or
- B: You get angry with this asocial behavior and point out that it is your turn to order.

Answer 343

The results show that:

Men reported higher levels of positive emotions compared to women.
Women reported higher levels of negative emotions compared to men.

Answer 344

"The stimuli were 66 film fragments (10 seconds each, no sound) taken from viewing and interview steps of each speaker."

Answer 345

Men were perceived to express more positive emotions compared to women.
Women were perceived to express more negative emotions compared to men.

Answer 346

Men perceived more positive emotions compared to women.

Women perceived more negative emotions compared to men.

Answer 347

The film MIP worked very well (reliable method).
Systematic (and significant) gender differences were found:
1. Women feel induced emotion stronger.
2. Women display induced emotion more clearly.
3. Women perceive induced emotion more accurately.

Answer 348

Do blind people also exploit visual cues?
Is the way they express such cues similar to that of sighted people?
How do visual cues relate to their auditory ones?

Answer 349

"Observers tended to give more correct answers about sighted people (M = .61, SE = .01) than about blind people "

Answer 350

"Happiness "

Answer 351

"

Happiness (M = .83, SE = .01),
sadness (M = .66, SE = .01),
anger (M = .44, SE = .01),
scared (M = .36, SE = .01).

"

Answer 352

"The audiovisual condition (M = .63, SE = .01)."

Answer 353

Audio-only condition (M = .59, SE = .01)
video-only condition (M = .49, SE = .01).

Answer 354

"Both use auditory and visual cues to signal emotions, showing similar behavioral patterns."

Answer 355

Visual cues from blind people tend to be more difficult to judge.

Answer 356

"Blind people use auditory cues more strongly."

Answer 357

How about other and more social emotions (e.g. cues to uncertainty)?

Answer 358

Attitude you can have towards a message

Answer 359

Epistemic (“knowledge-y”, e.g. certainty)
Affective (you can feel something about the message)
Emotional (just emotional, not about the message)

Answer 360

- Verbal cues (examples)

* Modal verbs (could, would, should)

* Particles (surely, probably, luckily)

Not an endless list, just examples

- Non-verbal cues

* Facial gestures/prosody

* Audiovisual rosody and feeling of knowing

Answer 361

The more cues, the less feeling of knowing.

Answer 362

"

De resultaten toonden aan dat gezichtsuitdrukkingen en intonatie significante indicatoren waren van een persoon's ""feeling of knowing"", en dat deze signalen door anderen accuraat konden worden herkend en geïnterpreteerd.

"

Answer 363

"Hoe mensen hun ""feeling of knowing"" kunnen uitdrukken en herkennen, en hoe deze samenhangt met non-verbale signalen zoals gezichtsuitdrukkingen en intonatie."

Answer 364

" Hoe meer cues er zijn, hoe meer onzekerheid er is bij zelfevaluatie van ""feeling of knowing"". Meer cues leiden tot een lagere feeling of knowing. Echter, bij het geven van non-antwoorden (als je zeker weet dat je het antwoord niet weet), leidt meer cues tot een hogere feeling of knowing, omdat je snel ""nee"" kunt zeggen."

Answer 365

Prosodie (intonatie) helpt bij het bepalen of een antwoord volledig is of niet. Verschillende tonen kunnen een logische stelling aangeven, en helpen luisteraars te beoordelen of het antwoord compleet is.

Answer 366

"Prosodie helpt door een lichte intonatieverhoging te gebruiken bij elke nieuwe punt (voordeel), wat de luisteraar helpt begrijpen dat elk punt apart en belangrijk is. De dalende toon aan het einde van de opsomming geeft aan dat de lijst compleet is."

Answer 367

"Een intonatieverhoging bij elk voordeel signaleert dat elk punt apart en belangrijk is."

Answer 368

"Een dalende toon aan het einde van een opsomming geeft de luisteraar het signaal dat de lijst compleet is."

Answer 369

"Prosodie helpt door een dalende toon te gebruiken aan het einde van de opsomming, wat aangeeft dat het antwoord volledig is."

Answer 370

"

In het onvolledige antwoord ""Hij heeft de achtergrondinformatie gegeven,"" wordt een lichte stijging of vlakke toon gebruikt, wat aangeeft dat het antwoord mogelijk nog niet volledig is en er meer informatie kan volgen.

"

Answer 371

"In het antwoord ""Ja, hij heeft de achtergrondinformatie gegeven, de huidige stand van zaken uitgelegd, en de toekomstige stappen besproken,"" geeft een dalende toon aan het einde aan dat alle punten besproken zijn en het antwoord volledig is."

Answer 372

Irony is a way of using words so that their intended meaning is different from the literal meaning, often to create emphasis or humor.

Answer 373

"Verbal irony is when the literal meaning of what is said contrasts with the intended meaning, often for emphasis or humor."

Answer 374

"Context helps determine the intended valence (positive or negative tone) of an ironic statement."

Answer 375

"Sarcasm is a subtype of irony that is negative and critical."

Answer 376

"Tropes related to irony include hyperbole (exaggeration) and understatement (downplaying something)."

Answer 377

"Jocularity is saying things in a fun and playful way."

Answer 378

"A rhetorical question is asked not to receive an answer but to make a point or create an effect."

Answer 379

"Non-verbal irony refers to a stance or property of a message where the context or co-text indicates an ironic meaning."

Answer 380

"Verbal irony is recognized through context and non-verbal cues."

Answer 381

"Context refers to the circumstances or environment, while co-text refers to the surrounding text."

Answer 382

"

A stance is a speaker's attitude or position towards a topic, expressed through both verbal and non-verbal communication.

"

Answer 383

" In the sentence ""I wonder how comfortable the replacement bus service will be,"" the co-text could be ""I already expect it to be a disaster."" The co-text helps clarify that the statement is ironic."

Answer 384

"Markers (cues) appear both during and after ironic statements, as shown by the increased presence of visual cues in both stages."

Answer 385

"The percentage of utterances with visual cues is higher during ironic statements compared to baseline statements."

Answer 386

"During ironic statements, there are more visual cues such as movements in the general face, eyes, eyebrows, mouth, head, and gestures."

Answer 387

"Participants described videos using prompted sentences (e.g., ""These singers have a splendid future in the world of music"") and their responses were annotated for facial movements, gestures, lexical items, and prosody."

Answer 388

"The results indicated that irony involves more visual cues, leading to a higher perception of irony. When someone is ironic, there are more verbal and non-verbal markers."

Answer 389

"When instructed to be ironic, speakers show lower pitch."

Answer 390

" In naturally occurring irony, there are often no significant differences in pitch."

Answer 391

"There is no consistent change in speech rate; some studies show no lower speech rate while others show lower speech rate."

Answer 392

" ""Dripping sarcasm"" is associated with a higher pitch."

Answer 393

"Contrast as a cue means that irony is signaled by a prosodic difference from the surrounding context, rather than by a specific level of prosody. For example, a statement might be ironic if its pitch or tone differs significantly from the usual pattern in that conversation."

Answer 394

"

Satirical imitation involves pretense and criticism. For example, Alec Baldwin speaks faster when imitating Donald Trump, who normally speaks slower than Baldwin's normal voice.

"

Answer 395

"Baldwin's speech rate is significantly faster when imitating Trump, even faster than his normal speech rate, despite Trump's normal speech rate being slower than Baldwin's."

Answer 396

"No, there is no significant difference in pitch spread between Baldwin's normal speech and his imitation of Trump."

Answer 397

"A dead-pan voice (lack of emotion/expression) is commonly used in satirical speech."

Answer 398

" The study found differences mainly in smiling and laughter between participants."

Answer 399

"Non-verbal cues on stance include epistemic cues like certainty and affective cues like liking."

Answer 400

"Non-verbal cues are necessary to recognize irony because they help convey the speaker's true intent when the context is unclear. For example, ""I am very happy to take the train to Tilburg this week"" might rely on non-verbal cues to show irony."

Answer 401

"The presence of gestures and contrast in speech or behavior may signal irony."

Answer 402

"It is when speakers copy linguistic forms (such as words) of their speaking partner."

Answer 403

"Mimicry, alignment, adaptation, and accommodation."

Answer 404

"Gestures, facial expressions, syntactic structures, and prosody."

Answer 405

"Forms of mimicry that happen spontaneously and on the spot during interaction."

Answer 406

"They are different from long-term mimicry (e.g., fashion) and stylised or conventionalised mimicry (e.g., greeting behaviour)."

Answer 407

"No, the distinction between these different forms of mimicry may not always be straightforward."

Answer 408

"They suggest a tight link between perception and behavior, where a speaker's words or syntactic structures are ""primed"" by those of their conversation partner."

Answer 409

" Alignment is viewed as a largely automatic (almost unconscious) process."

Answer 410

"The naive expectation is that adaptation is symmetrical."

Answer 411

Adaptation might be asymmetrical in interactions between:

People with different hierarchical status
Parents and children
Native and non-native speakers

Answer 412

"What about interactions between speakers of different language varieties?"

Answer 413

"Dutch is a West Germanic language."

Answer 414

"It is the native language of most of the population of the Netherlands and about sixty percent of the population of Belgium (Flemish part) and former colonies."

Answer 415

"Dutch is spoken by about 22 million people."

Answer 416

The regional variations considered here are Netherlandic Dutch (ND) and Belgian Dutch (BD) (also known as Flemish).

Answer 417

"Flemish speakers adapt more to Dutch, than the other way around."

Answer 418

"

Dutch is a pluricentric language, but speakers consider the variant in Haarlem as the ""best"" one.
Diachronically, Flemish have adapted more to Dutch than the other way around.
Flemish have fewer problems understanding Dutch.
Regional dialects are stronger in Belgium, which may cause Flemish speakers to be more sensitive to language variation.

"

Answer 419

"A variant of the battle ship game."

Answer 420

"The game is played via Skype connection and participants cannot see each other."

Answer 421

"Each game is played between a Flemish and a Dutch participant."

Answer 422

"Participants take turns being the leader or follower."

Answer 423

"lemish speakers adapted more to Dutch ones (33% vs 10%)."

Answer 424

"Significant effects of Whostarts (players who follow adapt more) and Round (more adaptation in round 2)."

Answer 425

"The interaction between Nationality and Whostarts showed that when a Dutch person starts the game, there is more adaptation by Flemish speakers than when a Flemish person starts."

Answer 426

" It was described as a very spontaneous, unconscious process."

Answer 427

"Yes, Flemish and Dutch speakers immediately recognized that the other participant was of a different nationality."

Answer 428

"Half of the icons were chosen because they could potentially lead to different pronunciations."

Answer 429

"

Flemish speakers adapted more to Dutch ones (10% vs 1%).
The degree of phonological adaptation was much smaller than lexical adaptation.
here was no boosting effect; the degree of lexical adaptation did not correlate with the degree of phonological adaptation.

"

Answer 430

"The study only looked at speakers of the Brabantian variant of dutch, so the situation could be different for Limburgian variants spoken on either side of the border."

Answer 431

"What about adaptation in other pluricentric communities such as German, French, Italian, English, and Portuguese?"

Answer 432

"Gestures without an intrinsic meaning (e.g., beat gestures) and gestures that visually depict something (iconic vs metaphoric use)."

Answer 433

"They are determined by the rhythm of speech."

Answer 434

" Iconic gestures (concrete depiction) and metaphoric gestures (abstract depiction)."

Answer 435

"Alignment is when people adapt their behaviour to that of the people with whom they are interacting."

Answer 436

People mimic posture and bodily gestures (both conventionalized and spontaneous ones).

Answer 437

"Kimbara (2006, 2008) and Parrill and Kimbara (2006)."

Answer 438

"They provided insights into how speakers adapt their gestures to specific addressees."

Answer 439

"An addressee is the person or entity to whom speech or communication is directed. In other words, it is the listener or receiver of the message being conveyed by the speaker. "

Answer 440

"

The kind of addressee
The addressee's perspective
The meaning of the gesture

"

Answer 441

"It is a figure of speech, an implied comparison.

Cambridge Dictionary: An expression that describes a person or object by referring to something that is considered to have similar characteristics to that person or object.

Oxford Dictionary: A word or phrase used to describe somebody/something else, in a way that is different from its normal use, to show that the two things have the same qualities and to make the description more powerful.
"

Answer 442

"

summarized: target domain = abstract concept, easified with a more concrete concept (source domain)"

Answer 443

Target = climate change
Source = icecream melts

Answer 444

"

In westerse culturen is de uitdrukking van de toekomst als volgt:
- Op de verticale as: De toekomst wordt aan de rechterkant geplaatst.
- Op de sagittale as: De toekomst wordt vooraan geplaatst.
Deze tijd-ruimte link is ook terug te vinden in de taal:
- In het Engels: de toekomst ligt voor ons (""ahead""), en we kijken terug (""back"").
- In het Nederlands: je kijkt vooruit (""vooruit""), en je blikt terug (""terug"").

Dit betekent dat zowel in de Engelse als in de Nederlandse taal tijd vaak wordt beschreven in termen van beweging door de ruimte, waarbij de toekomst voor ons ligt en het verleden achter ons.

"

Answer 445

"

which makes sense, because they read from above to belowd.
"

Answer 446

"

a.k.a. people had to describe words... There were sentences with spatial connotations and without. Researched was if people used gestures."

Answer 447

"

Participants: 31 native Mandarin speakers from Rizhao Polytechnic.
Procedure:
- Participants listened to 54 pairs of sentences, 18 of which contained temporal relations (past or future).
- They sat in front of a computer screen, looking at an empty gray screen while listening to the sentences.
- Occasionally, they answered true/false questions about the sentences.
- Eye movements were recorded using a portable eye-tracker (eye-tribe).
Stimuli: Sentences included vertical spatial metaphors (e.g., ""last month,"" ""next month""), sagittal spatial metaphors (e.g., ""before,"" ""after""), and neutral temporal references (e.g., ""yesterday,"" ""tomorrow"").

Conclusions:

Eye movements revealed differences in how participants conceptualized past and future.
Significant differences were found between Swiss German and Chinese participants.
Participants could not guess the true purpose of the study.
Linguistic material, especially vertical time words, had a noticeable early effect on eye movements.

"

Answer 448

Moroccans focused more on the past, Spaniards on the future, and Chinese participants showed a neutral or balanced focus.

Answer 449

Moroccans predominantly placed the past in front, while Spaniards and some Chinese groups placed the future in front.

test Flashcards

(487 cards)