20 - Acoustic Cues - Speech Perception Flashcards
Which formant(s) is/are required for adequate perception of vowels?
Usually only F1 and F2
What are formant transitions good for regarding vowel perception?
- better vowel identification for vowels in CVC context than for isolated vowels
- better vowel identification for isolated formant transitions than for isolated steady states
What does the term “semivowel” refer to?
Glides (w, j)
Liquids (r, l)
How would the rate of change of the formant transitions differ between glides and stops?
Stops have shorter transitions
e.g. /b/ = 0-50 msec
/w/ = 75-150 msec
Which formant transition distinguishes /w/ from /y/?
F2 transitions
/w/ = 800 Hz loci
/j/ = 2200 Hz loci
True or False: The semivowel glides /w/ and /j/ can be synthesized with only F1 and F2
True
True or False: The semivowel liquids /r/ and /l/ usually only require F1 for synthesis
False; /r/ and /l/ usually require F1, F2, and F3 for synthesis.
F3 values distinguish /l/ (2700 Hz) from /r/ (1600 Hz)
What is one manner cue that distinguishes a nasal from a vowel?
- weakening of the upper formants’ amplitudes (due to damping and antiformants)
- low frequency ‘nasal’ resonance (300 Hz)
What place cues can we use to distinguish nasals (e.g. m, from n, from ng)?
- Formant transitions and loci same as for stops /m/ = 800 Hz /n/ = 1800 Hz /ng/ = 3000 Hz - antiformants /m/ = 750-1250 Hz /n/ = 1450-2200 Hz 'ng' = above 3000 Hz
Name 2 of the 4 manner cues we can use to identify stops
- rate of change in formant transitions (glide vs stop; e.g. /b/ = 0-50 msec, /w/ = 75-150 msec)
- silent period (stop gap) signals a stop/affricate from fricative
- duration of turbulent noise
- less than 40 msec for stops
- 40-90 msec for affricates
- > 90 msec for fricatives
- rise time
- stops = 5-20 msec- affricates = 30-50 msec
- fricatives = >70 msec
Burst frequency is a place cue that we can use to identify p, t, and k. At what frequency would we expect to find it for each of them?
Burst frequency:
/p/ = 500-1500 Hz
/k/ = 1500-4000 Hz
/t/ = >4000 Hz
Besides burst frequency, what is the other place cue that we can use to identify the stops p, t, and k?
F2 transitions and loci
/p/ = 800 Hz
/k/ = 3000 Hz
/t/ = 1800 Hz
Name 2 of the 5 voicing cues that we can use to separate the voiceless stops (ptk) from the voiced stops (bgd)
- voicing during closure
- presence of aspiration noise
- voice onset time (burst to onset of voicing)
- closure duration (“rabid” with more than 70 msec of silence before /b/ changes to “rapid”)
- duration of vowel preceding the stop (voiceless shorter than voiced)
‘Trading relations’ exist between these 5 voicing cues
E.g. increasing the length of the preceding vowel duration can shift the categorical boundary observed for VOT
What manner cues can we use to separate a fricative from an affricate or stop?
- extended period of frication noise (>90 msec)
- slow rise time (>70 msec)
What are the 3 place cues we can use to identify fricatives?
Noise Frequency
- s and sh have sharp peaked spectra, f and th have flat spectra
- s (>4000 Hz) has higher frequency than sh (2500 Hz)
F2 Formant Transitions
- very important cues for /f/, /v/, and linguadentals (th)
- f/v have low starting frequency (900 Hz)
- linguadentals have higher starting frequency (1700-2400 Hz)
Relative Intensity
- much lower for f/th than s/sh
- in noisy environments, there are frequent confusions between “v” vs “th”, and “f” vs “th”