Acoustic Theory of Vowel Production Flashcards
Basis of Acoustic Theory of Vowel Production
For vowel production, the vocal tract resonates like a tube closed at one end, and shapes an input signal generated by the vibrating vocal folds.
Two major concepts
the resonance patterns of a tube closed at one end,
the shaping of an input by a resonator
The Time Domain
Time-domain characteristics of the signal produced by vocal fold vibration are complex
When a microphone is placed directly in front of a speaker’s lips while he or she phonates a vowel, the recorded acoustic event will reflect the combination of source (vocal fold) and resonator (vocal tract) acoustics
Another approach must be found to separate the waveform of a recorded vowel into the parts contributed by
(a) the vibrating vocal folds and
(b) the resonating vocal tract
Glottal Area Function (Ag)
The baseline in this plot represents full approximation of the vocal folds (i.e., Ag ~0), and upward movement of the function indicates increasingly greater separation of the vocal folds (i.e., increasing Ag).
One cycle of vocal fold vibration is defined as the interval between successive separations of the vocal folds
The vocal folds are fully approximated for a substantial portion of each cycle (nearly 40% of each cycle)
The Ag function is not an acoustic signal, but rather reflects a pattern of vibration that produces an acoustic signal.
Glottal Flow (Vg)
When the vocal folds separate during a vibratory cycle, airflow through the glottis is expected because speech is produced with tracheal pressures greater than those in front of the lips (i.e., Patm), and air always flows from regions of higher pressure to regions of lower pressure
Magnitude of this airflow should be zero when the vocal folds are fully approximated, and maximum when the vocal folds are widest apart (when Ag is the largest)
airflow coming through the glottis should increase as Ag increases, and decrease as Ag decreases
Glottal Flow (Vg) graph
Figure 8–1. B. Glottal airflow function (g) obtained by inverse filtering (see text). Upward displacement indicates increasing magnitude of airflow passing through the glottis.
The plot of the magnitude of airflow coming through the glottis as a function of time looks a lot like the Ag function
Because the g reflects movement of air molecules, which results in sound pressure waves, Vg is the proper signal to study as the acoustics of the source—vocal fold vibration—in vowel production.
Characteristics of glottal flow Vg
The signal is periodic, meaning that its characteristic shape repeats over time
The rate at which it repeats over time is the fundamental frequency (F0) of vocal fold vibration, or how many times per second the vocal folds go through complete cycles of vibration
The signal typically produced has a shape in which the slope of the opening phase is shallower than the slope of the closing phase
Makes each cycle appear as if it is “leaning to the right”
steepness of the closing phase reflects how rapidly the vocal folds come together at the end of each cycle
Impacts frequency-domain characteristics of the source
The signal shows some portions where the vocal folds are apart and some portions where the vocal folds are approximated.
The ratio of open time to closed time for each cycle, which in normal voices is typically around 1.2:2 (i.e., the vocal folds are open approximately 60% of each cycle), may be an important determinant of how much of the source signal is periodic and how much is aperiodic
Fourier Analysis
Fourier Analysis: a mathematical process that can be used to identify the frequency components contributing to a source waveform, shown in (B)
the Glottal Source Spectrum
Glottal Source Spectrum
The series of frequency components at consecutive-integer multiples of the lowest-frequency component
The relative amplitudes of the frequency components that decrease systematically as frequency increases
Fundamental frequency (F0): the lowest frequency of the glottal source spectrum, corresponds to the rate of vibration of the vocal folds
The F0 is also called the first harmonic (H1) of the source spectrum
The other frequency components in the glottal source spectrum are whole-number multiples of the F0.
There is a component at two times the F0 (the second harmonic, H2), three times the F0 (the third harmonic, H3), four times the F0 (the fourth harmonic, H4), etc.
In theory the number of harmonics in the glottal source spectrum is infinite, but the progressive reduction in relative amplitude with increasing frequency greatly limits the significance of very high frequency harmonics.
Periodic Nature of the Waveform
The Vg waveform repeats over time, is not sinusoidal, and is, therefore, a complex periodic event.
The repetition of the Vg waveform is not perfectly periodic, but rather has very small variations in the periods of successive glottal cycles.
Vocal fold vibration is therefore technically referred to as quasi-periodic.
The period of the glottal waveform depends on the rate of vibration of the vocal folds, which varies according to a number of factors, including sex and age
The glottal spectra of speakers with low F0s are more densely packed with harmonics compared with the glottal spectra of speakers with high F0s.
Shape of the Waveform
There is a systematic relationship between this closing slope and the “tilt” of the glottal spectrum:
the steeper the closing slope in the Vg waveform (the faster the vocal folds return to the midline on each cycle), the less tilted the glottal spectrum.
Tilt of the Glottal Spectrum
Reference value of 8 dB per octave as the typical reduction in harmonic amplitude across frequency.
Vg waveforms with very steep closing slopes have a smaller dB change per octave (<8 dB per octave), and those with very shallow closing slopes have a larger dB change per octave (>8 dB per octave).
These concepts are important in understanding the physiological and acoustical bases of hyperfunctional and hypofunctional voice disorders.
Hyperfunctional voice disorders: vocal folds move together too rapidly and forcefully on each closing phase of vocal fold vibration, resulting in a glottal spectrum with less than normal tilt, or too much energy in the higher-frequency harmonics.
Voice sounds abnormal, pressed, overly effortful or strained.
Hypofunctional voice disorders: vocal folds move together more slowly and less forcefully, resulting in a highly tilted glottal spectrum because there is so little energy in the higher-frequency harmonics.
Voice sounds weak, breathy, and thin.
Ratio of Open Time to Closed Time
For each cycle in a typical Vg waveform, the vocal folds are apart about 60% of the time, and approximated about 40% of the time.
A waveform with a shallower (slower) closing phase is also likely to have more open time throughout a complete cycle.
Similarly, a waveform with a steeper (faster) closing phase is likely to have less open time and, therefore, a longer closed phase throughout a complete cycle.
Speed of closing is often correlated with the open time or closed time (greater speed, less open time; less speed, more open time)
less open time is generally associated with a less tilted glottal spectrum, and more open time with a more tilted glottal spectrum.
The closing speed and ratio of open phase to closed phase (OP/CP) are somewhat redundant descriptions of Vg waveforms (and spectral characteristics).
Nature of the Input Signal: Summary
The input signal generated by the vibrating vocal folds is a complex periodic waveform whose spectrum consists of a consecutive integer series of harmonics at whole-number multiples of the F0.
The harmonics in the glottal spectrum systematically decrease in relative amplitude with increasing frequency.
These harmonics serve as input to the vocal tract resonator, which shapes that input according to its resonant characteristics.
Tube Resonance Characteristics of the Vocal Tract
Relationship between tube length and tube resonances: shorter tubes have higher resonant frequencies than longer tubes.
Consistent with the higher resonant frequencies of children’s vocal tracts compared with longer vocal tracts and lower resonant frequencies of either men or women. For the same reason, women tend to have higher resonant frequencies than men
Vocal tract is like a tube closed at one end
Instances of vocal fold adduction result in air immediately above the vocal folds becoming compressed and initiates a pressure wave through the vocal tract
At the vocal fold boundary of the vocal tract, there is, for an instant, no airflow and the air molecules become compressed, whereas at the open, oral boundary of the vocal tract air molecules move freely between the lips.
Each time the vocal folds snap together, a pressure wave is set up in the vocal tract and obeys the rules of resonance in a tube closed at one end
vocal tract resonances are excited each time the vibrating vocal folds snap together
Response of the Vocal Tract to Excitation
The resonance waveform responding to the first excitation is initiated with relatively great amplitude, which declines over each successive cycle until the vibration dies out completely (see figure 8-7 on previous slide)
Note also that the resonance is re-excited before the previous waveform dies out completely