Speech production Flashcards
3 sources of speech production
All three have wideband spectrum:
- Voicing: vibration of the vocal folds, same type of aerodynamic mechanism as a flag flapping in the wind.
- Frication or Aspiration: turbulence created when air passes through a narrow aperture
- Burst: the “pop” that occurs when high air pressure is suddenly released
3 steps to produce speech
- initiation
- phonation
- articulation
vocal folds aka cords
two bands of smooth muscle tissue found in the larynx (voice box).
vocal tract
air cavity between glottis and lips
source-filter model of the vocal tract
source-filter model of the vocal tract (with details)
formula for speech signal (through convolution of excitation signal and transfer function)
The speech signal s(t), is created by convolving (∗) an excitation
signal e(t) through a vocal tract transfer function h(t):
s(t) = e(t) * h(t)
Fourier transform through excitation product times transfer function
The Fourier transform of speech is the product of excitation
times transfer function:
S(f) = H(f)E(f)
formants
vocal tract resonances
what happens at resonant frequencies?
At the resonant frequencies, the resonance enhances the energy of
the excitation, so the transfer function H(f) is large at those
frequencies, and small at other frequencies
air stream (vowels)
unblocked air stream
air stream (consonants)
blocked / obstructed air stream
vowels scheme
vowel classification: tongue height
Tongue height:
– Low: e.g., /a/
– Mid: e.g., /e/
– High: e.g., /i/
vowel classification: tongue advancement
Tongue advancement:
– Front : e.g., /i/
– Central : e.g., /ə/
– Back : e.g., /u/
vowel classification: lip rounding
Lip rounding:
– Unrounded: e.g., /ɪ, ɛ, e, ǝ/
– Rounded: e.g., /u, o, ɔ/
vowel classification: tense vs lax
Tense/lax:
– Tense: e.g., /i, e, u, o, ɔ, ɑ/
– Lax: e.g., /ɪ, ɛ, æ, ə/
vowel scheme: dependence of formants 1 and 2 on tongue placement
consonants classification: manners of articulation
Manner of articulation
– Stops: /p, t, k, b, d, g/
– Fricatives: /f, s, S, v, z, Z/
– Affricates: /tS, dZ/
– Approximants/Liquids: /l, r, w, j/
– Nasals: /m, n, ng/
coarticulation
- Coarticulation refers to changes in speech articulation (acoustic or visual) of the current speech segment (phoneme or viseme) due to neighboring speech. In the visual domain, this phenomenon arises because the visual articulator movements are affected by the neighboring visemes.
- production of a speech sound becomes
more like that of a preceding/following speech sound
f0 and H1
- fundamental frequency f(0) and first harmonic H(1) are the same thing.
- The fundamental frequency, or f0, is the first harmonic, or H1. There is a harmonic at each interval of the f0 up to infinity. Vocal fold vibration produces many harmonics above f0, all the way up to 5000Hz in the adult human vocal tract. These harmonics decrease in amplitude as the frequency increases.
magnitude spectrum and log magnitude spectrum formulas
magnitude spectrum: S(f) = H(f)E(f)
log magnitude spectrum = ln |S(f)| = ln |H(f)| + ln |E(f)|
axes(spectrogram)
Spectrogram = time on the horizontal axis, frequency on vertical axis.
spectral splatter
spectral splatter (also called switch noise) refers to spurious emissions that result from an abrupt change in the transmitted signal, usually when transmission is started or stopped.
For example, a device transmitting a sine wave produces a single peak in the frequency spectrum; however, if the device abruptly starts or stops transmitting this sine wave, it will emit noise at frequencies other than the frequency of the sine wave. This noise is known as spectral splatter.
When the signal is represented in the time domain, an abrupt change may not be visually apparent; in the frequency domain, however, the abrupt change causes the appearance of spikes at various frequencies.