Lecture 16 - Speech Synthesis Flashcards

1
Q

Name the five building blocks for speech synthesis

A
  1. data acquisition
  2. emotional speech recognition
  3. natural language understanding
  4. dialog management
  5. emotional speech synthesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Three production steps of speech

A
  1. Respiration (power supply)
  2. Phonation (source signal)
  3. Articulation (spectral color filtering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is respiration (power supply)?

A

Air needed to produce pressure and vibrations

Main organs involved are the lungs, diaphragm, nose, and mouth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is phonation (source signal)?

A

this is making sounds by shaping vocal cords
air passes through the larynx which causes the tissue (vocal cords) to vibrate and produce sounds
at this stage the sound is like a buzzing noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is articulation (spectral coloring filter)?

A

this is turning the raw sound into intelligable sound, called speech
you move your tongue, palate, and other organs to utter words
main organs involved: lips, teeth, tongue, palates, vocal cords, nasal cavity, uvula, jaw

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Three types of signal characteristics:

A
  1. periodic (produced in the larynx)
  2. noisy (produced in the larynx)
  3. impulsive (produced by constrictions in the vocal tract)

Sh -> noisy
o -> periodic
p -> impulsive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do males have lower pitch?

A

Because their vocal cords are longer and have higher mass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

features of vocal folds

A

as muscle tension increasaes vocal folds open and close faster, hence the pitch increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variations in amplitude are called..

A

shimmer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what defines pitch?

A

1 / period glottal cycle, mostly between 60 - 400 Hz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

varying pitch period is called ..

A

jitter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between a phone and a phoneme?

A

a phone is any distinct speech sound, regardless of whether the exact sound is critical to the meanings of words

in contrast, a phoneme is a speech sound in a given language that, if swapped with another phoneme, would change one word to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

name the 3 prosody parameters

A

f0 -> relates to pitch
energy
speech tempo
- speaking rate, articulation rate, syllable length, pause length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

steps in emotion recognition system creation

A

respresentative training data
data cleaning, enhancement, and normalization
feature extraction
classification algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why emotional speech synthesis?

A

emotions are natural for humans
emotions can help to communicate better
emotions can help to understand better

more effective human-machine interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly