Topic 6 - Speech Perception Flashcards
Acoustic Signal
Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract
Vowels
Vowels are produced by vibration of the vocal cords and changes in the shape of the vocal tract by moving the articulators
These changes in shape cause changes in the resonant frequency of the vocal tract and produce peaks in pressure at a number of frequencies called formants
Each vowel has a characteristic series of ‘formants’ (resonant frequencies)
Articulators
structures such as tongue, lips, teeth, jaw and soft palate
Formants
Resonant Frequencies
The first formant has the lowest frequency, the second has the next highest, etc.
Sound/Speech spectrograms
Sound/Speech spectrograms are a better way to show the changes in frequency and intensity for speech
Consonants
Produced by a constriction of the vocal tract and air flow around articulators
Phoneme
smallest unit of speech that changes meaning of a word
In English there are 47 phonemes - 13 major vowel sounds and 24 major consonant sounds
Variability problem
There is no simple correspondence between the acoustic signal and individual phonemes
- Variability comes from a phoneme’s context
- Acoustic signals that vary can be perceived categorically (/b/ as in ‘bat’ or ‘bite’) i.e., as ‘the same’ sound
Coarticulation
overlap between articulation of neighbouring phonemes also causes variation
Variability between different speakers
Speakers differ in pitch, accent, speed in speaking, and pronunciation
- This acoustic signal must be transformed into familiar words
- People perceive speech easily in spite of the variability problems due to perceptual constancy
Categorical perception
This occurs when a wide range of acoustic cues results in the perception of a limited number of sound categories
Voice onset time (VOT)
time delay between when a sound starts and when voicing begins (vocal chords begin to vibrate)
VOT experiment Eimas & Corbit
VOT for /da/ is 17ms (short) and /ta/ is 91ms (long)
Computers were used to create stimuli with a range of VOTs from long to short
Listeners do not hear the incremental changes, instead they hear a sudden change from /da/ to /ta/ at the phonetic boundary
Thus, we experience perceptual constancy for the phonemes within a given range of VOT
The McGurk Effect
Visual stimulus shows a speaker saying “ga-ga.”
Auditory stimulus has a speaker saying “ba-ba.”
Observer watching and listening hears “da-da”, which is the midpoint between “ga” and “ba.”
Observer with eyes closed will hear “ba.”
Physiological link between vision and speech
Calvert et al. showed that the same brain areas are activated for lip reading and speech perception.
FFA activation
Von Kreigstein et al. showed that the FFA is activated when listeners hear familiar voices
This shows a link between perceiving faces and voices.
Experiment by Rubin et al.
meaning and phoneme
Short words (sin, bat, and leg) and short nonwords (jum, baf, and teg) were presented to listeners.
The task was to press a button as quickly as possible when they heard a target phoneme
On average, listeners were faster with words (580 ms) than non-words (631 ms)
Experiment by Warren
meaning and phoneme
Listeners heard a sentence that had a phoneme covered by a cough
The task was to state where in the sentence the cough occurred.
Listeners could not correctly identify the position and they also did not notice that a phoneme was missing - called the phonemic restoration effect
This did not happen for non-word sentences
Experiment by Miller and Isard
meaning and phoneme
Stimuli were three types of sentences:
- Normal grammatical sentences - Anomalous sentences (made no sense) that were grammatical - Ungrammatical strings of words
Listeners were to shadow (repeat aloud) the sentences as they heard them through headphones
Results showed that listeners were – 89% accurate with normal sentences – 79% accurate for anomalous sentences – 56% accurate for ungrammatical word strings
Segmentation Problem
there are no physical breaks in the continuous acoustic signal
Top-down processing, including knowledge a listener has about a language, affects perception of the incoming speech stimulus
Segmentation is affected by context, meaning, and our knowledge of word structure
Speech Segmentation
the perception of individual words in a conversation
Word Structure, Segmentation
Transitional probabilities - the chance that one sound will follow another in a language
Statistical learning - the process of learning transitional probabilities and other language characteristics
Statistical Learning experiment by Saffran et al.
Learning phase - infants heard nonsense words in two-minute strings of continuous sound that contained transitional probabilities
Nonsense words were in random order within the string
If infants use transitional probabilities, they should recognize the words as units even though the string of words had no breaks
Indexical characteristics
characteristics of the speaker’s voice such as age, gender, emotional state, level of seriousness, etc