Acoustic Cues Flashcards
Listener Cues
Listeners use more than acoustic information
Knowledge of the speaking situation
Knowledge of the speaker
Visual cues obtained by watching the face and gestures of the speaker
Nonacoustic cues are important
Person perceiving speech is aware of the message – not the individual speech sounds or patterns that make up the message.
Speech Perception
The auditory system is especially tuned for speech
We are best at hearing speech sounds
Infants categorize sounds of speech into groups similar to the distinctive groups that are used in many languages
It’s a specialized aspect of a general human ability – to seek and recognize patterns
Speech Development
Begins during the prenatal period
Learns the melody of speech and sounds of language in 3rd trimester
Newborns respond to their mother’s voice over an unfamiliar voice
Prefer passages read during the last trimester over novel passage
Preference for maternal reading over an unfamiliar female reading
Speech Perception
Speech sounds are rarely produced in isolation; They overlap and influence one another
For perception – speech sounds often are not discrete and separable – as are letters in written words; The listener must use context to decode the acoustic message
Listeners often perceive speech sounds by using the acoustic information in neighboring segments
There is evidence that speech perception is a somewhat specialized function in the brain
Vowel Perception
Vowels – most perceptually salient sounds
Phonated and high in intensity
The vocal tract is relatively open
Long in duration
The most important acoustic cues to the perception of vowels are in the frequencies and patterning of the speaker’s formants
Listeners usually required only the first and second formants to identify a vowel
However, formant frequency values are not reliable cues to vowel identification because
- variety of vocal tract sizes producing the formants
- Formant frequencies are affected by context and rate of articulation
- With increased rate of speech, vowels are often neutralized – like the schwa
- At normal conversational rates the articulators are in continual motion and the peaks of resonance are continually changing
Vowel Identification
Use patterns rather than the actual values of formant frequencies
Example /i/ - first formant is very low and 2nd formant is very high.
The formant frequencies will be different from speaker to speaker, the size of the gap will be great – same for the vowels /a/ and /u/
Semivowel Identification
/w/, /j/, /r/, /l/ - voiced and characterized by changing formant frequencies called transitions
These provide acoustic cues to their identification
Only 2 formants are needed to recognize the /w/ and /j/ sounds
3 formants are needed for the /r/ and /l/ sounds
Transitions
occur when a vowel precedes or follows a consonant, which reflects changes in resonance as the vocal tract shape changes to or from the more constricted consonant position
Nasal Identification
Formant transitions of vowels preceding and following nasals were effective cues to the identity of the nasals as a class
This change from an orally produced vowel to a nasal includes 2 important features
- a weakening of intensity serves as a cue to nasal manner
- addition of a resonance below 500 Hz is called the nasal murmur
Formant transitions to and from /m/ are the lowest in frequency and shortest in duration
To and from /n/ are higher in frequency and longer in duration
To and from /ng/ are the highest and most variable in frequency and the longest in duration
Plosive Identification
The acoustic cues are overlaid in the acoustic cues for neighboring vowels and consonants
The listener perceives a stop and the sounds adjacent to it on the basis of their acoustic relationship to one another
Two obvious differences between stops and all other classes of sounds (except affricates)
- complete occlusion of the vocal tract
- stopped air is released as a transient burst of noise
Fricative Identification
The most important acoustic feature of fricatives is the presence of the noise generated by the turbulent airstream as it passes through the articulatory constriction
/s, z, sh, zh/ have high-frequency spectral peaks
/th, f, v/ ha relatively flat spectra
This spectral distinction divides the fricatives into 2 general categories of place
Posterior – sibilant (high intensity levels)
Anterior – non-sibilant (low intensity levels)
Affricate Identification
They contain acoustic cues that are present in both stops and fricatives: The silence Release burst Rapid rise time Frication Formant transitions in adjacent sounds
Perception of Manner
To identify manner of articulation, listeners determine whether the sound is:
Harmonically structured with no noise (vowels, semivowels, nasals) – these are low in frequency
Contains an aperiodic component (stops, fricative, affricates) – high in frequency
Cues for Place
Depend on the parameter of sound frequency
Vowels and semivowels – formant relationships indicate tongue placement, mouth opening, and vocal tract length
Stops, fricatives, and affricates – the F2 transitions to and from neighboring vowels and frequency of noise components
Cues for Voicing
Depend more on durations and timing of events than on frequency and intensity differences
Timing differences are very important in signaling the voiced/voiceless contrast in sounds
Longer voice onset times, extended periods of aspiration and longer closer durations cue /p, t, k/
Short voice onset times, little or no aspiration, short closure duration cue the voiced stops /b, d, g/
Fricatives and affricates are perceived as voiceless when the frication is of relatively long duration
Affricates when the closure duration is also relatively long