Topic 11: Speech Synthesis Flashcards

1
Q

Recap on speech synthesis block diagram

A

text analysis -> phonetic analysis -> prosodic analysis -> speech synthesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Text analysis

A

document structure detection - this will determine how TTS will be implemented

text normalization

linguistic analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Document structure detection

A

flat file or file content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Text normalization: transliteration

A

convert the text into a standardized format..

example:
transliteration..
how to map hangeul to standard form

hangeul consonant
hangeul vowel

example of transliateration
this is not translation..is mapping the prnounciation into standard form???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Text normalization: dealing with different format

A

symbols
number format
combination
abbreviation and acronym

normalizing numbers

  • phone number
  • dates
  • times
  • money and currency
  • account number
  • ordinal number
  • cardinal number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Text normalization

A

diff prob need diff approach

can use RE

  • test pattern
  • replace
  • search substring

example

  • extract substring
  • replace
  • test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linguistic Analysis

A

processing text based on linguistic feature of the language

support phonetic and prosodic generation

modular function required for TTS..

  • Sentence breaking/tokenizer
  • POS (give example)
  • homograph disambiguation (example)
  • noun phrase & clause detection
  • sentence type disambiguation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Phonetic Analysis

A

conversion of grapheme to phoneme

written word to pronunciation form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prosodic analysis

A

prosody is the melody of speech
syle, rhythm, timbre
intonation
stylisation of sound..how you tune the sound

acoustic feature controllable is limited

  • pitch / f0
  • duration
  • intensity

problem with modifying intensity is the frequency will change as well

pitch and duration is best not construct from scratch

Klatt formant synthesizer is the voice filter model to create speech by manipulating pitch, noise and formants information

speech quality is not ok
speech segment concatenation is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Klatt Duration Model

A

Klatt study duration model for English phoneme

based on thousands of samples, the obtained basic duration is called inherent duration

final duration is dependent on

  • categories of neighbouring unit
  • position of phoneme in syllable
  • other constituent of the syllable
  • position of the syllable into word
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prosodic analysis : MBROLA

A

a speech synthesiser engine based on concatenation of diphone
concatenation if diphone waveform using TD-PSOLA

can manipulate duration and pitch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Prosodic analysis : MOMEL

A

Modelization of melodie
INSTINT - internation transcription system for intonation

objective

  • how to model melody of speech
  • how to represent the intonation without paying attention to what language being analyzed.

MOMEL follow existing pitch by using quadratic spline function

voiceless part is interpolated as well so no discontinuities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly