Topic 6 - Speech Perception Flashcards
Acoustic Signal
Produced by air that is pushed up from the lungs through the vocal cords and into the vocal tract
Vowels
Vowels are produced by vibration of the vocal cords and changes in the shape of the vocal tract by moving the articulators
These changes in shape cause changes in the resonant frequency of the vocal tract and produce peaks in pressure at a number of frequencies called formants
Each vowel has a characteristic series of ‘formants’ (resonant frequencies)
Articulators
structures such as tongue, lips, teeth, jaw and soft palate
Formants
Resonant Frequencies
The first formant has the lowest frequency, the second has the next highest, etc.
Sound/Speech spectrograms
Sound/Speech spectrograms are a better way to show the changes in frequency and intensity for speech
Consonants
Produced by a constriction of the vocal tract and air flow around articulators
Phoneme
smallest unit of speech that changes meaning of a word
In English there are 47 phonemes - 13 major vowel sounds and 24 major consonant sounds
Variability problem
There is no simple correspondence between the acoustic signal and individual phonemes
- Variability comes from a phoneme’s context
- Acoustic signals that vary can be perceived categorically (/b/ as in ‘bat’ or ‘bite’) i.e., as ‘the same’ sound
Coarticulation
overlap between articulation of neighbouring phonemes also causes variation
Variability between different speakers
Speakers differ in pitch, accent, speed in speaking, and pronunciation
- This acoustic signal must be transformed into familiar words
- People perceive speech easily in spite of the variability problems due to perceptual constancy
Categorical perception
This occurs when a wide range of acoustic cues results in the perception of a limited number of sound categories
Voice onset time (VOT)
time delay between when a sound starts and when voicing begins (vocal chords begin to vibrate)
VOT experiment Eimas & Corbit
VOT for /da/ is 17ms (short) and /ta/ is 91ms (long)
Computers were used to create stimuli with a range of VOTs from long to short
Listeners do not hear the incremental changes, instead they hear a sudden change from /da/ to /ta/ at the phonetic boundary
Thus, we experience perceptual constancy for the phonemes within a given range of VOT
The McGurk Effect
Visual stimulus shows a speaker saying “ga-ga.”
Auditory stimulus has a speaker saying “ba-ba.”
Observer watching and listening hears “da-da”, which is the midpoint between “ga” and “ba.”
Observer with eyes closed will hear “ba.”
Physiological link between vision and speech
Calvert et al. showed that the same brain areas are activated for lip reading and speech perception.