Acoustic Theory of Vowel Production Flashcards

1
Q

Basis of Acoustic Theory of Vowel Production

A

For vowel production, the vocal tract resonates like a tube closed at one end, and shapes an input signal generated by the vibrating vocal folds.
Two major concepts
the resonance patterns of a tube closed at one end,
the shaping of an input by a resonator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Time Domain

A

Time-domain characteristics of the signal produced by vocal fold vibration are complex
When a microphone is placed directly in front of a speaker’s lips while he or she phonates a vowel, the recorded acoustic event will reflect the combination of source (vocal fold) and resonator (vocal tract) acoustics
Another approach must be found to separate the waveform of a recorded vowel into the parts contributed by
(a) the vibrating vocal folds and
(b) the resonating vocal tract

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Glottal Area Function (Ag)

A

The baseline in this plot represents full approximation of the vocal folds (i.e., Ag ~0), and upward movement of the function indicates increasingly greater separation of the vocal folds (i.e., increasing Ag).
One cycle of vocal fold vibration is defined as the interval between successive separations of the vocal folds
The vocal folds are fully approximated for a substantial portion of each cycle (nearly 40% of each cycle)
The Ag function is not an acoustic signal, but rather reflects a pattern of vibration that produces an acoustic signal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Glottal Flow (Vg)

A

When the vocal folds separate during a vibratory cycle, airflow through the glottis is expected because speech is produced with tracheal pressures greater than those in front of the lips (i.e., Patm), and air always flows from regions of higher pressure to regions of lower pressure
Magnitude of this airflow should be zero when the vocal folds are fully approximated, and maximum when the vocal folds are widest apart (when Ag is the largest)
airflow coming through the glottis should increase as Ag increases, and decrease as Ag decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Glottal Flow (Vg) graph

A

Figure 8–1. B. Glottal airflow function (g) obtained by inverse filtering (see text). Upward displacement indicates increasing magnitude of airflow passing through the glottis.
The plot of the magnitude of airflow coming through the glottis as a function of time looks a lot like the Ag function
Because the g reflects movement of air molecules, which results in sound pressure waves, Vg is the proper signal to study as the acoustics of the source—vocal fold vibration—in vowel production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Characteristics of glottal flow Vg

A

The signal is periodic, meaning that its characteristic shape repeats over time
The rate at which it repeats over time is the fundamental frequency (F0) of vocal fold vibration, or how many times per second the vocal folds go through complete cycles of vibration
The signal typically produced has a shape in which the slope of the opening phase is shallower than the slope of the closing phase
Makes each cycle appear as if it is “leaning to the right”
steepness of the closing phase reflects how rapidly the vocal folds come together at the end of each cycle
Impacts frequency-domain characteristics of the source
The signal shows some portions where the vocal folds are apart and some portions where the vocal folds are approximated.
The ratio of open time to closed time for each cycle, which in normal voices is typically around 1.2:2 (i.e., the vocal folds are open approximately 60% of each cycle), may be an important determinant of how much of the source signal is periodic and how much is aperiodic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fourier Analysis

A

Fourier Analysis: a mathematical process that can be used to identify the frequency components contributing to a source waveform, shown in (B)
the Glottal Source Spectrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Glottal Source Spectrum

A

The series of frequency components at consecutive-integer multiples of the lowest-frequency component
The relative amplitudes of the frequency components that decrease systematically as frequency increases
Fundamental frequency (F0): the lowest frequency of the glottal source spectrum, corresponds to the rate of vibration of the vocal folds
The F0 is also called the first harmonic (H1) of the source spectrum
The other frequency components in the glottal source spectrum are whole-number multiples of the F0.
There is a component at two times the F0 (the second harmonic, H2), three times the F0 (the third harmonic, H3), four times the F0 (the fourth harmonic, H4), etc.
In theory the number of harmonics in the glottal source spectrum is infinite, but the progressive reduction in relative amplitude with increasing frequency greatly limits the significance of very high frequency harmonics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Periodic Nature of the Waveform

A

The Vg waveform repeats over time, is not sinusoidal, and is, therefore, a complex periodic event.
The repetition of the Vg waveform is not perfectly periodic, but rather has very small variations in the periods of successive glottal cycles.
Vocal fold vibration is therefore technically referred to as quasi-periodic.
The period of the glottal waveform depends on the rate of vibration of the vocal folds, which varies according to a number of factors, including sex and age
The glottal spectra of speakers with low F0s are more densely packed with harmonics compared with the glottal spectra of speakers with high F0s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Shape of the Waveform

A

There is a systematic relationship between this closing slope and the “tilt” of the glottal spectrum:
the steeper the closing slope in the Vg waveform (the faster the vocal folds return to the midline on each cycle), the less tilted the glottal spectrum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tilt of the Glottal Spectrum

A

Reference value of 8 dB per octave as the typical reduction in harmonic amplitude across frequency.
Vg waveforms with very steep closing slopes have a smaller dB change per octave (<8 dB per octave), and those with very shallow closing slopes have a larger dB change per octave (>8 dB per octave).
These concepts are important in understanding the physiological and acoustical bases of hyperfunctional and hypofunctional voice disorders.
Hyperfunctional voice disorders: vocal folds move together too rapidly and forcefully on each closing phase of vocal fold vibration, resulting in a glottal spectrum with less than normal tilt, or too much energy in the higher-frequency harmonics.
Voice sounds abnormal, pressed, overly effortful or strained.
Hypofunctional voice disorders: vocal folds move together more slowly and less forcefully, resulting in a highly tilted glottal spectrum because there is so little energy in the higher-frequency harmonics.
Voice sounds weak, breathy, and thin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ratio of Open Time to Closed Time

A

For each cycle in a typical Vg waveform, the vocal folds are apart about 60% of the time, and approximated about 40% of the time.
A waveform with a shallower (slower) closing phase is also likely to have more open time throughout a complete cycle.
Similarly, a waveform with a steeper (faster) closing phase is likely to have less open time and, therefore, a longer closed phase throughout a complete cycle.
Speed of closing is often correlated with the open time or closed time (greater speed, less open time; less speed, more open time)
less open time is generally associated with a less tilted glottal spectrum, and more open time with a more tilted glottal spectrum.
The closing speed and ratio of open phase to closed phase (OP/CP) are somewhat redundant descriptions of Vg waveforms (and spectral characteristics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Nature of the Input Signal: Summary

A

The input signal generated by the vibrating vocal folds is a complex periodic waveform whose spectrum consists of a consecutive integer series of harmonics at whole-number multiples of the F0.
The harmonics in the glottal spectrum systematically decrease in relative amplitude with increasing frequency.
These harmonics serve as input to the vocal tract resonator, which shapes that input according to its resonant characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Tube Resonance Characteristics of the Vocal Tract

A

Relationship between tube length and tube resonances: shorter tubes have higher resonant frequencies than longer tubes.
Consistent with the higher resonant frequencies of children’s vocal tracts compared with longer vocal tracts and lower resonant frequencies of either men or women. For the same reason, women tend to have higher resonant frequencies than men
Vocal tract is like a tube closed at one end
Instances of vocal fold adduction result in air immediately above the vocal folds becoming compressed and initiates a pressure wave through the vocal tract
At the vocal fold boundary of the vocal tract, there is, for an instant, no airflow and the air molecules become compressed, whereas at the open, oral boundary of the vocal tract air molecules move freely between the lips.
Each time the vocal folds snap together, a pressure wave is set up in the vocal tract and obeys the rules of resonance in a tube closed at one end
vocal tract resonances are excited each time the vibrating vocal folds snap together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Response of the Vocal Tract to Excitation

A

The resonance waveform responding to the first excitation is initiated with relatively great amplitude, which declines over each successive cycle until the vibration dies out completely (see figure 8-7 on previous slide)
Note also that the resonance is re-excited before the previous waveform dies out completely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How Are the Acoustic Properties of the Vocal Tract Determined?

A

Recall that acoustic resonators can be described in the frequency domain by a resonance curve (see figure 7-18→)
Consider the vocal tract shape associated with schwa (/ə/), a shape very much like a tube having uniform cross-sectional area from the glottis to the lips (straight tube with no constrictions)
If we know the length of the tube and it’s one closed end, the quarter-wavelength rule can be applied to obtain the multiple peaks of the resonance curve—that is, the resonant frequencies of the tube.
(bandwidth is also another consideration to determine shape the resonance curves)

17
Q

Determining Acoustic Properties of the Vocal Tract

A

Figure 8–8. The resonant frequencies along the curve are computed by the quarter-wavelength rule, and the bandwidth of each resonance is assumed to be 60 Hz.
Schwa has no vocal tract constrictions
However, vocal tract configurations typically involve constrictions along the path from glottis to lips, some of which are relatively tight.
This would result in different resonance curves

18
Q

Area Function of the Vocal Tract

A

Plot of cross-sectional area as a function of distance along the vocal tract from glottis to lips.
Each section number has an area value, so the function is actually a sequence of discrete points.
For illustration purposes, the discrete points have been connected and the area function is represented in Figure 8–10 as a continuous line.
Fant estimated mass and compliance properties from the area measurement for each section. This information was used to produce a theoretical estimate of the resonant pattern for the entire vocal tract

19
Q

Vocal Tract Resonance Curve

A

The vocal tract resonance curve is computed from the measured area function.
The resonant peaks are determined mathematically, rather than measured by analyzing the spectrum of a produced vowel.
Therefore, the computed resonance curve is called a theoretical spectrum, or a filter function.
The filter function shows where the resonances for a given vocal tract configuration should be

20
Q

Vocal Tract Output

A

The frequency-domain representation of the vocal tract output is called an output spectrum.
This is the spectrum measured, with appropriate instruments, from an actual vowel produced by a talker.
(Recall that the output measured is not the same as the signal measured at the vocal folds)
Constrictions along the vocal tract will result in different patterns of resonances, depending on where they occur
resulting in production of different vowels

21
Q

Formants

A

Regions of high energy (resonances) that determine the phonetic quality of a vowel
Think of the values as center frequencies denoting a region of spectral prominence
Figure 8–13. Output spectra for the vowels /i/ (top panel), /ɑ/ (middle panel), and /u/ (bottom panel)
Shows the spectral envelope,
The peaks in the envelope are the formant frequencies, marked F1, F2, and F3 in each spectrum.
First 3 formants are called the F-pattern of a vowel

22
Q

Formant Bandwidths

A

Vowel resonances are relatively sharply tuned
Resonances in the spectrum have narrow bandwidths compared with the frequencies of those resonances
Nasalized vowels tend to have greater formant bandwidths than non-nasalized vowels
Bandwidth of the first formant is greater when a vowel is produced with breathy, compared with typical, phonation

23
Q

Acoustic Theory of Vowel Production: Summary

A

The input, or source, for vowel production is the acoustic result of vocal fold vibration.
The resonator (or filter) in vowel acoustics is the vocal tract, which extends from the top margin of the vocal folds to the lips. The vocal tract resonates like a tube closed at one end, the closed end being the vocal folds, the open end being the lips.
A theory developed by Fant (1960) relates the area function of the vocal tract to the specific resonant frequencies of the tube.
The area function is a plot of the cross-sectional area of the vocal tract from glottis to lips. The area function is a description of the shape of the air column formed by the articulators and the more or less fixed structures of the vocal tract .
The actual acoustic event that emerges from the lips ​—a vowel sound—is the product of the acoustic characteristics of the glottal spectrum and the varying amplitudes along the filter function.
The first three formant frequencies of a vowel are referred to as the F-pattern of that vowel

24
Q

Pressure & Velocity
In a Tube

A

Figure 8–14.
A. Shows the pressure distributions corresponding to the first three resonances of a tube closed at one end. Pressure distributions follow the quarter-wavelength rule.
B. Velocity distributions for the first three resonances of the tube shown in A. Velocities at any point in the tube are the mirror image of the pressure distributions. Thus, when pressure is maximum, velocity is zero, and vice versa. The center horizontal line in A represents Patm, whereas the center horizontal line in B represents zero velocity.

25
Q

Still Imagining the Constriction in a Tube…

A

Placed in the tube, exactly at a location of maximum pressure (and, therefore, zero velocity of air molecule movement) (Fig. 8-15A)
The constriction in the region of maximum pressure compresses the air molecules even more, forcing them closer together and farther from their rest positions.
Causes the air molecules to become stiffer.
A constriction in a region of maximum pressure increases the stiffness of the air molecules for that wavelength.
Increased stiffness in a vibratory system results in a higher resonant frequency
When a constriction is placed at a region of maximum pressure, the second resonance increases relative to the value computed for the unconstricted tube.

26
Q

Imagining Another the Constriction in a Tube…

A

Placed in the tube, exactly at a location of zero pressure (and, therefore, maximum velocity of air molecule movement) (Fig. 8-15B)
The tube resonance in this case decreases relative to the case when the tube is not constricted.
Because a constriction in the region of a velocity maximum has the effect of increasing the acoustic mass (due to narrowing of the passageway in which air molecules are vibrating), which will lower the resonant frequency of a vibratory system.

27
Q

General Principles for Tube Resonances

A
  1. The resonances of a tube closed at one end, with no constrictions, are the reference resonances for tubes with constrictions.
  2. A constriction located at a pressure maximum raises the frequency of the resonance whose wavelength “carries” the pressure maximum. This is because the constriction increases the stiffness of the air molecules along that wavelength.
  3. A constriction located at a velocity maximum lowers the frequency of the resonance whose
    wavelength “carries” the velocity maximum. This is because the constriction increases the acoustic mass of the air molecules along that wavelength.
  4. A constriction between a pressure or velocity maximum changes the frequency of the relevant resonance according to the relative magnitudes of the pressure and velocity at the point of constriction.
    Thus, a constriction at a point where the pressure is above Patm, but not maximum, may increase or decrease the resonant frequency, depending on the actual magnitudes of the pressures and velocities at that point in the tube. The effects of constrictions on resonant frequencies are continuous (i.e., they do not apply only at pressure or velocity maxima).
28
Q

Constrictions in the Vocal Tract

A

Midsagittal depiction of a vocal tract, below which are three tubes showing the pressure distributions for the first three resonances.
The closed (right) end of the tube is analogous to the glottal end of the vocal tract, whereas the open (left) end of the tube is analogous to the lips.
The numbers 1, 2, and 3 show locations of hypothetical constrictions within the vocal tract and their corresponding locations relative to the pressure distributions within the tube.
High pressures within the tubes are indicated when the wavelength pressure distribution is against the edges of the tube. Sign of the pressure is not relevant. The thin horizontal line within each tube indicates atmospheric pressure.

29
Q

Constrictions in the Vocal Tract 2

A

Any constriction in the vocal tract affects all resonant frequencies of the tube. Because the pressure (or velocity) distributions for all resonances are superimposed on each other when the air in the tube vibrates
Any constriction affects the pressure (or velocity) distributions for every resonance.
The concepts of stiffness and mass explain why a constriction in the region of a pressure maximum raises a resonant frequency, and why a constriction in the region of a velocity maximum lowers a resonant frequency.

30
Q

Perturbation Theory

A

Explains how the resonances of a tube are changed when the cross-sectional dimensions of the tube are perturbed, or constricted.
Vowel articulation is the creation of vocal tract tubes with different area functions.
The area functions are modified by the kinds of constrictions described above, which result in different resonant frequencies for different vowels.

31
Q

The Three-Parameter Model of Stevens and House (1955)

A

Stevens and House determined that mapping between vocal tract configuration and vocal tract output was well described with just three parameters:
tongue height
tongue advancement
configuration of the lips

32
Q

Tongue Height

A

Vowel tongue height describes the relative height of the tongue at the location of the major vocal tract constriction.
For American English vowels made in the front of the vocal tract, the tongue height series from lowest (most open vocal tract) to highest (most closed vocal tract) is /æ,ε,e,ɪ,i/.
The corresponding series for American English back vowels is /ɑ,ↄ,o,ʊ,u/.
Main acoustic effect is on the first formant frequency (F1)

33
Q

Tongue Advancement

A

Tongue advancement refers to the position of the major constriction for a vowel along the anterior-posterior (front-back) dimension of the vocal tract.
In American English, the major constrictions for vowels such as /æ,ε,e,ɪ,i/ are toward the front of the vocal tract, whereas the major constrictions for /ɑ,ↄ,o,ʊ,u/ are toward the back of the vocal tract.
There are also a few vowels in American English whose constrictions are in a relatively central location in the vocal tract (between front and back vowels). These would include /ε,ə,ʌ,ɝ,ɚ/
Increases in tongue advancement result in an increasing F2 and a decreasing F1

34
Q

Configuration of the Lips

A

Some vowels of American English are produced with rounded lips.
These vowels include /u,ʊ,o,ↄ/.
American English does not have a vowel contrast that depends on rounded versus unrounded lips.
Many languages of the world, however, do have vowels distinguished by lip rounding.
For example, Swedish has a rounded and unrounded high-front vowel (/i/ versus /y/), and Japanese has a rounded and unrounded high-back vowel (/u/ versus /ɯ/).
The greatest decreases in formant frequencies with lip rounding are seen in F2, with somewhat lesser (but roughly equal) effects on F1 and F3.
Moreover, the influence of lip rounding on F2 depends substantially on tongue height: the higher the tongue, the more lip rounding causes F2 to decrease.

35
Q

Stevens and House (1955) Rules: Summary

A

Rule 1. F1 varies inversely with tongue height. The higher the tongue, the lower the F1. The rule applies more dramatically to front as compared with back vowels.
Rule 2. F2 increases, and F1 decreases, with increasing tongue advancement. The rule applies dramatically for high vowels, and less so for low vowels.
Rule 3. All formant frequencies decrease with increased rounding of the lips, but the major effect is on F2. The rule applies more dramatically to high as compared with low vowels.

36
Q

REVIEW

A

The acoustic theory of vowel production includes a source, or input (the vibrating vocal folds which produce a glottal spectrum), and a filter (the vocal tract resonator), which combine to produce an acoustic output measured directly in front of the lips.
Source-filter Theory
The source produces a complex periodic waveform whose spectrum consists of a consecutive-integer series of harmonics.
The vocal tract filter, or resonator, can be modeled as a tube closed at one end.
When the source and filter combine to produce an output spectrum, the peaks in this spectrum are referred to as formants.

37
Q

REVIEW CONT.

A

The formants are essentially the resonances of the vocal tract, which change according to changes in the shape of the vocal tract tube.
The vocal tract configuration for a schwa is most like the case of a tube closed at one end and having uniform cross-sectional area
When constrictions are introduced into the tube—when articulatory configuration changes from schwa to other vowels—the formant frequencies change according to rules.