Topic 11: Speech Synthesis Flashcards

Question 1

Q

Recap on speech synthesis block diagram

Answer

A

text analysis -> phonetic analysis -> prosodic analysis -> speech synthesis

Question 2

Q

Text analysis

Answer

A

document structure detection - this will determine how TTS will be implemented

text normalization

linguistic analysis

Question 3

Q

Document structure detection

Answer

A

flat file or file content

Question 4

Q

Text normalization: transliteration

Answer

A

convert the text into a standardized format..

example:
transliteration..
how to map hangeul to standard form

hangeul consonant
hangeul vowel

example of transliateration
this is not translation..is mapping the prnounciation into standard form???

Question 5

Q

Text normalization: dealing with different format

Answer

A

symbols
number format
combination
abbreviation and acronym

normalizing numbers

phone number
dates
times
money and currency
account number
ordinal number
cardinal number

Question 6

Q

Text normalization

Answer

A

diff prob need diff approach

can use RE

test pattern
replace
search substring

example

extract substring
replace
test

Question 7

Q

Linguistic Analysis

Answer

A

processing text based on linguistic feature of the language

support phonetic and prosodic generation

modular function required for TTS..

Sentence breaking/tokenizer
POS (give example)
homograph disambiguation (example)
noun phrase & clause detection
sentence type disambiguation

Question 8

Q

Phonetic Analysis

Answer

A

conversion of grapheme to phoneme

written word to pronunciation form

Question 9

Q

Prosodic analysis

Answer

A

prosody is the melody of speech
syle, rhythm, timbre
intonation
stylisation of sound..how you tune the sound

acoustic feature controllable is limited

pitch / f0
duration
intensity

problem with modifying intensity is the frequency will change as well

pitch and duration is best not construct from scratch

Klatt formant synthesizer is the voice filter model to create speech by manipulating pitch, noise and formants information

speech quality is not ok
speech segment concatenation is used

Question 10

Q

Klatt Duration Model

Answer

A

Klatt study duration model for English phoneme

based on thousands of samples, the obtained basic duration is called inherent duration

final duration is dependent on

categories of neighbouring unit
position of phoneme in syllable
other constituent of the syllable
position of the syllable into word

Question 11

Q

Prosodic analysis : MBROLA

Answer

A

a speech synthesiser engine based on concatenation of diphone
concatenation if diphone waveform using TD-PSOLA

can manipulate duration and pitch

Question 12

Q

Prosodic analysis : MOMEL

Answer

A

Modelization of melodie
INSTINT - internation transcription system for intonation

objective

how to model melody of speech
how to represent the intonation without paying attention to what language being analyzed.

MOMEL follow existing pitch by using quadratic spline function

voiceless part is interpolated as well so no discontinuities

Topic 11: Speech Synthesis Flashcards

(12 cards)