Physics of speech Flashcards
Acoustic phonetics
study of sound waves made by the human
vocal organs for communication.
study of the physical properties of speech,
and aims to analyse sound wave signals that occur within speech
through varying frequencies, amplitudes and durations
Speech Waveform
transmitted through air (and other medium) as a regular
wave of pressure changes
changes in air pressure
- can be heard
- cannot see
- can be measured
- measurement can be visualized and used for statistical calculation
basic parameters of the speech signals
- amplitude
* time (duration)
main derived parameters of speech signals
- Intensity
- noise vs. resonance
- frequency and formants
methods used to analyse speech signals
- analog-to-digital (A/D) conversion
* mathematical definitions of filters and transformations
Visualisation for Speech Signals
need to know
- time
- amplitude
- oscillogram
- spectogram
- formants
- fundamental frequency
- pitch track
Time Domain
•
What actually appear in your waveform if you zoom it in
The positive or negative amplitude A of the speech signal at any given
point in time is the distance of the wave from zero at this point in
time
What can be derived from time domain analysis? Intensity
intensity of the speech signal at any given point
in time is the square of the amplitude of the wave
from zero at this point in time: I = square(A)
What can be derived from time domain analysis? Period or interval
duration of this single wave
- signal is resonant if its period are regular in duration
- signal is noisy if period is irregular in duration
wavelength λ (lambda)
speed of sound in m/sec divided by the number of periods per second
distance between successive peaks is called wavelength. measured in meters / cm
wave
speed
v = λ / T v = λ f
Frequency Domain: Simple & Complex
Signals
frequency of a speech signal is the number of waves (periods) per second in the waveform
The source larynx
only valid for voiced sound
• Fundamental frequency
about 80 Hz - 150 Hz for men (greater range possible)
about 160 Hz – 300 Hz for women (greater range possible)
- Several overtones/harmonics of the fundamental frequency
- different intensities of overtones, relative to each other, which determines
the overall waveform
during voicing, the larynx generates a waveform which is rather like a “sawtooth” sequence
Frequency Domain: Complex Signals
- Combination of several arbitrary sine waves resulting noise
- But if it is not arbitrary, it produce harmonic wave
• The lower frequency is the F0
• The higher frequencies in a harmonic waveform are harmonics or
overtones of the F0
The filter system
consists of pharyngeal, nasal, oral cavities, with
resonant frequencies which amplify or damp the overtones with these frequencies
These filter frequency bands are called formants
Formant frequencies of the oral cavity can be modified by the variable filters (articulators such as tongue and lips)
Acoustic Resonance
understand how resonator (vocal tract) modified the complex periodic sound
How we get a speech sound
sound source
sets vocal tract into resonance depends on its shape
amplifies some freq and attenuates others
speech sounds come out
Source-Filter Theory
speech is a product of interaction between a sound source (vocal folds/phonation) and a filter (vocal tracts/resonance)
sound source (vocal folds/phonation) sometimes referred as source function
filter (vocal tract/resonance) sometimes referred as filter function / transfer function
Filter function
supralaryngeal vocal tract can be characterized by a filter function (length and shape of tube) which specifies (for each input frequency from the source) the relative amount of energy that is passed through the filter and out of the mouth
peaks in the vocal tract filter function are the resonances of the vocal tract
reflect the frequencies which are passed through the filter with the greatest
relative amplitude
Source and Filter: Independence
Formants = resonant frequencies of the vocal tract
- depends on shape of vocal tract to filter harmonics
- do not depend on f0 of the sound source
source and filter are independent of one another
Varying the Filter: Vocal Tract Shape
vocal tract = variable resonator
shape constantly changes over course of utterance
- space in oral cavity can be large/small
- space inpharyngeal can be L/s
air can resonate in nasal cavity / blocked from nasal cavity
changeing shape of vocal tract =
- change resonating frequencies
- change filter
- different speech sounds
each sounds has its own filter made up of its own set of resonant frequencies (formants)
Source Filter Model
spikes (harmonics) are generated by SOURCE
peaks (formants) are generated by FILTER
single spectral analysis of an interval in a speech signal requires at least 1s
sequence of spectra is neede to track the changing structure of speech signal
the representation of sequence of spectra is called spectogram
Source Filter Model
spikes (harmonics) are generated by SOURCE
peaks (formants) are generated by FILTER
single spectral analysis of an interval in a speech signal requires at least 1s
sequence of spectra is neede to track the changing structure of speech signal
the representation of sequence of spectra is called spectogram