20 - Acoustic Cues - Speech Perception Flashcards
Which formant(s) is/are required for adequate perception of vowels?
Usually only F1 and F2
What are formant transitions good for regarding vowel perception?
- better vowel identification for vowels in CVC context than for isolated vowels
- better vowel identification for isolated formant transitions than for isolated steady states
What does the term “semivowel” refer to?
Glides (w, j)
Liquids (r, l)
How would the rate of change of the formant transitions differ between glides and stops?
Stops have shorter transitions
e.g. /b/ = 0-50 msec
/w/ = 75-150 msec
Which formant transition distinguishes /w/ from /y/?
F2 transitions
/w/ = 800 Hz loci
/j/ = 2200 Hz loci
True or False: The semivowel glides /w/ and /j/ can be synthesized with only F1 and F2
True
True or False: The semivowel liquids /r/ and /l/ usually only require F1 for synthesis
False; /r/ and /l/ usually require F1, F2, and F3 for synthesis.
F3 values distinguish /l/ (2700 Hz) from /r/ (1600 Hz)
What is one manner cue that distinguishes a nasal from a vowel?
- weakening of the upper formants’ amplitudes (due to damping and antiformants)
- low frequency ‘nasal’ resonance (300 Hz)
What place cues can we use to distinguish nasals (e.g. m, from n, from ng)?
- Formant transitions and loci same as for stops /m/ = 800 Hz /n/ = 1800 Hz /ng/ = 3000 Hz - antiformants /m/ = 750-1250 Hz /n/ = 1450-2200 Hz 'ng' = above 3000 Hz
Name 2 of the 4 manner cues we can use to identify stops
- rate of change in formant transitions (glide vs stop; e.g. /b/ = 0-50 msec, /w/ = 75-150 msec)
- silent period (stop gap) signals a stop/affricate from fricative
- duration of turbulent noise
- less than 40 msec for stops
- 40-90 msec for affricates
- > 90 msec for fricatives
- rise time
- stops = 5-20 msec- affricates = 30-50 msec
- fricatives = >70 msec
Burst frequency is a place cue that we can use to identify p, t, and k. At what frequency would we expect to find it for each of them?
Burst frequency:
/p/ = 500-1500 Hz
/k/ = 1500-4000 Hz
/t/ = >4000 Hz
Besides burst frequency, what is the other place cue that we can use to identify the stops p, t, and k?
F2 transitions and loci
/p/ = 800 Hz
/k/ = 3000 Hz
/t/ = 1800 Hz
Name 2 of the 5 voicing cues that we can use to separate the voiceless stops (ptk) from the voiced stops (bgd)
- voicing during closure
- presence of aspiration noise
- voice onset time (burst to onset of voicing)
- closure duration (“rabid” with more than 70 msec of silence before /b/ changes to “rapid”)
- duration of vowel preceding the stop (voiceless shorter than voiced)
‘Trading relations’ exist between these 5 voicing cues
E.g. increasing the length of the preceding vowel duration can shift the categorical boundary observed for VOT
What manner cues can we use to separate a fricative from an affricate or stop?
- extended period of frication noise (>90 msec)
- slow rise time (>70 msec)
What are the 3 place cues we can use to identify fricatives?
Noise Frequency
- s and sh have sharp peaked spectra, f and th have flat spectra
- s (>4000 Hz) has higher frequency than sh (2500 Hz)
F2 Formant Transitions
- very important cues for /f/, /v/, and linguadentals (th)
- f/v have low starting frequency (900 Hz)
- linguadentals have higher starting frequency (1700-2400 Hz)
Relative Intensity
- much lower for f/th than s/sh
- in noisy environments, there are frequent confusions between “v” vs “th”, and “f” vs “th”
Name 2 of the 3 voicing cues of fricatives
- voicing during frication noise
- length of vowel preceding fricative (e.g. /u/ longer in /juz/ than /jus/
- higher intensity of voiced fricatives than voiceless
What are the 3 manner cues used to differentiate affricates from stops or fricatives?
- The silent period preceding the frication noise
- The duration of frication noise
- The rise time
A Trading Relation has been observed for these 3 manner cues such than a change in one can cause a shift in the categorical boundary for another
e.g. Inserting a silent period before the “sh” in “dish” causes it to become “ditch”
Extending the duration of “sh” in “ditch” changes it back to “dish”
Which has a longer silent period preceding the frication noise: an affricate or a fricative?
Affricate
- fricative = 0-20 msec (e.g. grey ship)
- affricate = 20-60 msec (e.g. grey chip)
An inserted stop may be perceived at >100 msec (“grey chip” becomes “great ship”)
How does the duration of frication noise differ between stops, affricates, and fricatives?
Stops < Affricates < Fricatives
Stops: Less than 40 msec
Affricates: 40-90 msec
Fricatives: >90 msec
How does the rise time differ between affricates, stops, and fricatives?
Stops < Affricates < Fricatives
Stops: 5-20 msec
Affricates: 30-50 msec
Fricatives: >70 msec
What are the 4 voicing cues that affricates experience?
Voicing during frication noise
Longer fricative interval for voiceless
Longer silent interval for voiceless
Longer vowels preceding voiced affricates