Lecture 16 - Speech Synthesis Flashcards
Name the five building blocks for speech synthesis
- data acquisition
- emotional speech recognition
- natural language understanding
- dialog management
- emotional speech synthesis
Three production steps of speech
- Respiration (power supply)
- Phonation (source signal)
- Articulation (spectral color filtering)
What is respiration (power supply)?
Air needed to produce pressure and vibrations
Main organs involved are the lungs, diaphragm, nose, and mouth
What is phonation (source signal)?
this is making sounds by shaping vocal cords
air passes through the larynx which causes the tissue (vocal cords) to vibrate and produce sounds
at this stage the sound is like a buzzing noise
What is articulation (spectral coloring filter)?
this is turning the raw sound into intelligable sound, called speech
you move your tongue, palate, and other organs to utter words
main organs involved: lips, teeth, tongue, palates, vocal cords, nasal cavity, uvula, jaw
Three types of signal characteristics:
- periodic (produced in the larynx)
- noisy (produced in the larynx)
- impulsive (produced by constrictions in the vocal tract)
Sh -> noisy
o -> periodic
p -> impulsive
Why do males have lower pitch?
Because their vocal cords are longer and have higher mass
features of vocal folds
as muscle tension increasaes vocal folds open and close faster, hence the pitch increases
variations in amplitude are called..
shimmer
what defines pitch?
1 / period glottal cycle, mostly between 60 - 400 Hz
varying pitch period is called ..
jitter
What is the difference between a phone and a phoneme?
a phone is any distinct speech sound, regardless of whether the exact sound is critical to the meanings of words
in contrast, a phoneme is a speech sound in a given language that, if swapped with another phoneme, would change one word to another
name the 3 prosody parameters
f0 -> relates to pitch
energy
speech tempo
- speaking rate, articulation rate, syllable length, pause length
steps in emotion recognition system creation
respresentative training data
data cleaning, enhancement, and normalization
feature extraction
classification algorithm
Why emotional speech synthesis?
emotions are natural for humans
emotions can help to communicate better
emotions can help to understand better
more effective human-machine interactions