Lecture 16 - Speech Synthesis Flashcards

Question 1

Q

Name the five building blocks for speech synthesis

Answer

A

data acquisition
emotional speech recognition
natural language understanding
dialog management
emotional speech synthesis

Question 2

Q

Three production steps of speech

Answer

A

Respiration (power supply)
Phonation (source signal)
Articulation (spectral color filtering)

Question 3

Q

What is respiration (power supply)?

Answer

A

Air needed to produce pressure and vibrations

Main organs involved are the lungs, diaphragm, nose, and mouth

Question 4

Q

What is phonation (source signal)?

Answer

A

this is making sounds by shaping vocal cords
air passes through the larynx which causes the tissue (vocal cords) to vibrate and produce sounds
at this stage the sound is like a buzzing noise

Question 5

Q

What is articulation (spectral coloring filter)?

Answer

A

this is turning the raw sound into intelligable sound, called speech
you move your tongue, palate, and other organs to utter words
main organs involved: lips, teeth, tongue, palates, vocal cords, nasal cavity, uvula, jaw

Question 6

Q

Three types of signal characteristics:

Answer

A

periodic (produced in the larynx)
noisy (produced in the larynx)
impulsive (produced by constrictions in the vocal tract)

Sh -> noisy
o -> periodic
p -> impulsive

Question 7

Q

Why do males have lower pitch?

Answer

A

Because their vocal cords are longer and have higher mass

Question 8

Q

features of vocal folds

Answer

A

as muscle tension increasaes vocal folds open and close faster, hence the pitch increases

Question 9

Q

variations in amplitude are called..

Question 10

Q

what defines pitch?

Answer

A

1 / period glottal cycle, mostly between 60 - 400 Hz

Question 11

Q

varying pitch period is called ..

Question 12

Q

What is the difference between a phone and a phoneme?

Answer

A

a phone is any distinct speech sound, regardless of whether the exact sound is critical to the meanings of words

in contrast, a phoneme is a speech sound in a given language that, if swapped with another phoneme, would change one word to another

Question 13

Q

name the 3 prosody parameters

Answer

A

f0 -> relates to pitch
energy
speech tempo
- speaking rate, articulation rate, syllable length, pause length

Question 14

Q

steps in emotion recognition system creation

Answer

A

respresentative training data
data cleaning, enhancement, and normalization
feature extraction
classification algorithm

Question 15

Q

Why emotional speech synthesis?

Answer

A

emotions are natural for humans
emotions can help to communicate better
emotions can help to understand better

more effective human-machine interactions

Lecture 16 - Speech Synthesis Flashcards

(15 cards)