Acoustic Theory of Vowel Production Flashcards
Basis of Acoustic Theory of Vowel Production
For vowel production, the vocal tract resonates like a tube closed at one end, and shapes an input signal generated by the vibrating vocal folds.
Two major concepts
the resonance patterns of a tube closed at one end,
the shaping of an input by a resonator
The Time Domain
Time-domain characteristics of the signal produced by vocal fold vibration are complex
When a microphone is placed directly in front of a speaker’s lips while he or she phonates a vowel, the recorded acoustic event will reflect the combination of source (vocal fold) and resonator (vocal tract) acoustics
Another approach must be found to separate the waveform of a recorded vowel into the parts contributed by
(a) the vibrating vocal folds and
(b) the resonating vocal tract
Glottal Area Function (Ag)
The baseline in this plot represents full approximation of the vocal folds (i.e., Ag ~0), and upward movement of the function indicates increasingly greater separation of the vocal folds (i.e., increasing Ag).
One cycle of vocal fold vibration is defined as the interval between successive separations of the vocal folds
The vocal folds are fully approximated for a substantial portion of each cycle (nearly 40% of each cycle)
The Ag function is not an acoustic signal, but rather reflects a pattern of vibration that produces an acoustic signal.
Glottal Flow (Vg)
When the vocal folds separate during a vibratory cycle, airflow through the glottis is expected because speech is produced with tracheal pressures greater than those in front of the lips (i.e., Patm), and air always flows from regions of higher pressure to regions of lower pressure
Magnitude of this airflow should be zero when the vocal folds are fully approximated, and maximum when the vocal folds are widest apart (when Ag is the largest)
airflow coming through the glottis should increase as Ag increases, and decrease as Ag decreases
Glottal Flow (Vg) graph
Figure 8–1. B. Glottal airflow function (g) obtained by inverse filtering (see text). Upward displacement indicates increasing magnitude of airflow passing through the glottis.
The plot of the magnitude of airflow coming through the glottis as a function of time looks a lot like the Ag function
Because the g reflects movement of air molecules, which results in sound pressure waves, Vg is the proper signal to study as the acoustics of the source—vocal fold vibration—in vowel production.
Characteristics of glottal flow Vg
The signal is periodic, meaning that its characteristic shape repeats over time
The rate at which it repeats over time is the fundamental frequency (F0) of vocal fold vibration, or how many times per second the vocal folds go through complete cycles of vibration
The signal typically produced has a shape in which the slope of the opening phase is shallower than the slope of the closing phase
Makes each cycle appear as if it is “leaning to the right”
steepness of the closing phase reflects how rapidly the vocal folds come together at the end of each cycle
Impacts frequency-domain characteristics of the source
The signal shows some portions where the vocal folds are apart and some portions where the vocal folds are approximated.
The ratio of open time to closed time for each cycle, which in normal voices is typically around 1.2:2 (i.e., the vocal folds are open approximately 60% of each cycle), may be an important determinant of how much of the source signal is periodic and how much is aperiodic
Fourier Analysis
Fourier Analysis: a mathematical process that can be used to identify the frequency components contributing to a source waveform, shown in (B)
the Glottal Source Spectrum
Glottal Source Spectrum
The series of frequency components at consecutive-integer multiples of the lowest-frequency component
The relative amplitudes of the frequency components that decrease systematically as frequency increases
Fundamental frequency (F0): the lowest frequency of the glottal source spectrum, corresponds to the rate of vibration of the vocal folds
The F0 is also called the first harmonic (H1) of the source spectrum
The other frequency components in the glottal source spectrum are whole-number multiples of the F0.
There is a component at two times the F0 (the second harmonic, H2), three times the F0 (the third harmonic, H3), four times the F0 (the fourth harmonic, H4), etc.
In theory the number of harmonics in the glottal source spectrum is infinite, but the progressive reduction in relative amplitude with increasing frequency greatly limits the significance of very high frequency harmonics.
Periodic Nature of the Waveform
The Vg waveform repeats over time, is not sinusoidal, and is, therefore, a complex periodic event.
The repetition of the Vg waveform is not perfectly periodic, but rather has very small variations in the periods of successive glottal cycles.
Vocal fold vibration is therefore technically referred to as quasi-periodic.
The period of the glottal waveform depends on the rate of vibration of the vocal folds, which varies according to a number of factors, including sex and age
The glottal spectra of speakers with low F0s are more densely packed with harmonics compared with the glottal spectra of speakers with high F0s.
Shape of the Waveform
There is a systematic relationship between this closing slope and the “tilt” of the glottal spectrum:
the steeper the closing slope in the Vg waveform (the faster the vocal folds return to the midline on each cycle), the less tilted the glottal spectrum.
Tilt of the Glottal Spectrum
Reference value of 8 dB per octave as the typical reduction in harmonic amplitude across frequency.
Vg waveforms with very steep closing slopes have a smaller dB change per octave (<8 dB per octave), and those with very shallow closing slopes have a larger dB change per octave (>8 dB per octave).
These concepts are important in understanding the physiological and acoustical bases of hyperfunctional and hypofunctional voice disorders.
Hyperfunctional voice disorders: vocal folds move together too rapidly and forcefully on each closing phase of vocal fold vibration, resulting in a glottal spectrum with less than normal tilt, or too much energy in the higher-frequency harmonics.
Voice sounds abnormal, pressed, overly effortful or strained.
Hypofunctional voice disorders: vocal folds move together more slowly and less forcefully, resulting in a highly tilted glottal spectrum because there is so little energy in the higher-frequency harmonics.
Voice sounds weak, breathy, and thin.
Ratio of Open Time to Closed Time
For each cycle in a typical Vg waveform, the vocal folds are apart about 60% of the time, and approximated about 40% of the time.
A waveform with a shallower (slower) closing phase is also likely to have more open time throughout a complete cycle.
Similarly, a waveform with a steeper (faster) closing phase is likely to have less open time and, therefore, a longer closed phase throughout a complete cycle.
Speed of closing is often correlated with the open time or closed time (greater speed, less open time; less speed, more open time)
less open time is generally associated with a less tilted glottal spectrum, and more open time with a more tilted glottal spectrum.
The closing speed and ratio of open phase to closed phase (OP/CP) are somewhat redundant descriptions of Vg waveforms (and spectral characteristics).
Nature of the Input Signal: Summary
The input signal generated by the vibrating vocal folds is a complex periodic waveform whose spectrum consists of a consecutive integer series of harmonics at whole-number multiples of the F0.
The harmonics in the glottal spectrum systematically decrease in relative amplitude with increasing frequency.
These harmonics serve as input to the vocal tract resonator, which shapes that input according to its resonant characteristics.
Tube Resonance Characteristics of the Vocal Tract
Relationship between tube length and tube resonances: shorter tubes have higher resonant frequencies than longer tubes.
Consistent with the higher resonant frequencies of children’s vocal tracts compared with longer vocal tracts and lower resonant frequencies of either men or women. For the same reason, women tend to have higher resonant frequencies than men
Vocal tract is like a tube closed at one end
Instances of vocal fold adduction result in air immediately above the vocal folds becoming compressed and initiates a pressure wave through the vocal tract
At the vocal fold boundary of the vocal tract, there is, for an instant, no airflow and the air molecules become compressed, whereas at the open, oral boundary of the vocal tract air molecules move freely between the lips.
Each time the vocal folds snap together, a pressure wave is set up in the vocal tract and obeys the rules of resonance in a tube closed at one end
vocal tract resonances are excited each time the vibrating vocal folds snap together
Response of the Vocal Tract to Excitation
The resonance waveform responding to the first excitation is initiated with relatively great amplitude, which declines over each successive cycle until the vibration dies out completely (see figure 8-7 on previous slide)
Note also that the resonance is re-excited before the previous waveform dies out completely
How Are the Acoustic Properties of the Vocal Tract Determined?
Recall that acoustic resonators can be described in the frequency domain by a resonance curve (see figure 7-18→)
Consider the vocal tract shape associated with schwa (/ə/), a shape very much like a tube having uniform cross-sectional area from the glottis to the lips (straight tube with no constrictions)
If we know the length of the tube and it’s one closed end, the quarter-wavelength rule can be applied to obtain the multiple peaks of the resonance curve—that is, the resonant frequencies of the tube.
(bandwidth is also another consideration to determine shape the resonance curves)
Determining Acoustic Properties of the Vocal Tract
Figure 8–8. The resonant frequencies along the curve are computed by the quarter-wavelength rule, and the bandwidth of each resonance is assumed to be 60 Hz.
Schwa has no vocal tract constrictions
However, vocal tract configurations typically involve constrictions along the path from glottis to lips, some of which are relatively tight.
This would result in different resonance curves
Area Function of the Vocal Tract
Plot of cross-sectional area as a function of distance along the vocal tract from glottis to lips.
Each section number has an area value, so the function is actually a sequence of discrete points.
For illustration purposes, the discrete points have been connected and the area function is represented in Figure 8–10 as a continuous line.
Fant estimated mass and compliance properties from the area measurement for each section. This information was used to produce a theoretical estimate of the resonant pattern for the entire vocal tract
Vocal Tract Resonance Curve
The vocal tract resonance curve is computed from the measured area function.
The resonant peaks are determined mathematically, rather than measured by analyzing the spectrum of a produced vowel.
Therefore, the computed resonance curve is called a theoretical spectrum, or a filter function.
The filter function shows where the resonances for a given vocal tract configuration should be
Vocal Tract Output
The frequency-domain representation of the vocal tract output is called an output spectrum.
This is the spectrum measured, with appropriate instruments, from an actual vowel produced by a talker.
(Recall that the output measured is not the same as the signal measured at the vocal folds)
Constrictions along the vocal tract will result in different patterns of resonances, depending on where they occur
resulting in production of different vowels
Formants
Regions of high energy (resonances) that determine the phonetic quality of a vowel
Think of the values as center frequencies denoting a region of spectral prominence
Figure 8–13. Output spectra for the vowels /i/ (top panel), /ɑ/ (middle panel), and /u/ (bottom panel)
Shows the spectral envelope,
The peaks in the envelope are the formant frequencies, marked F1, F2, and F3 in each spectrum.
First 3 formants are called the F-pattern of a vowel
Formant Bandwidths
Vowel resonances are relatively sharply tuned
Resonances in the spectrum have narrow bandwidths compared with the frequencies of those resonances
Nasalized vowels tend to have greater formant bandwidths than non-nasalized vowels
Bandwidth of the first formant is greater when a vowel is produced with breathy, compared with typical, phonation
Acoustic Theory of Vowel Production: Summary
The input, or source, for vowel production is the acoustic result of vocal fold vibration.
The resonator (or filter) in vowel acoustics is the vocal tract, which extends from the top margin of the vocal folds to the lips. The vocal tract resonates like a tube closed at one end, the closed end being the vocal folds, the open end being the lips.
A theory developed by Fant (1960) relates the area function of the vocal tract to the specific resonant frequencies of the tube.
The area function is a plot of the cross-sectional area of the vocal tract from glottis to lips. The area function is a description of the shape of the air column formed by the articulators and the more or less fixed structures of the vocal tract .
The actual acoustic event that emerges from the lips —a vowel sound—is the product of the acoustic characteristics of the glottal spectrum and the varying amplitudes along the filter function.
The first three formant frequencies of a vowel are referred to as the F-pattern of that vowel
Pressure & Velocity
In a Tube
Figure 8–14.
A. Shows the pressure distributions corresponding to the first three resonances of a tube closed at one end. Pressure distributions follow the quarter-wavelength rule.
B. Velocity distributions for the first three resonances of the tube shown in A. Velocities at any point in the tube are the mirror image of the pressure distributions. Thus, when pressure is maximum, velocity is zero, and vice versa. The center horizontal line in A represents Patm, whereas the center horizontal line in B represents zero velocity.