Topic 10: Text Analysis as part of TTS system Flashcards
Text to speech synthesis block diagram
text -> text analysis -> phonetic analysis -> prosodic analysis -> speech synthesis
text analysis
includes preprocessing and conversion
document structure detection
text normalization
linguistic analysis
phonetic analysis
grapheme-to-phoneme conversion
prosodic analysis
pitch and duration attachment
speech synthesis
voice rendering
Document structure detection detail
why needed? there are various text formats
TTS don’t pay attention to structure, bottom line is to synthesize speech
more tagging better speech expression
SSML
Text normalization detail
different NLP have different normalization purpose
For TTS, needed to be done until text converted to readable form
used to overcome ambiguity to certain extend
Why normalize?
- symbols
- number format
- combination of both
- abbreviation and acronym
- emoji
Normalisation: Abbreviation and
Acronym Expansion
example steps
Normalisation: Pattern Matching in RE when to use RE? where it can be done what is RE? what is string
• When to use RE: search and modify
• Where it can be use in: string, pattern, corpus matching
• A regular expression, often called a pattern, is an expression used to
specify a set of strings required for a particular purpose.
• String: For text-based search, a string is any sequence of
alphanumeric characters
More example application of RE
- test for pattern
- replace text
- extract substring
Linguistic Analysis (LA) detail
Also known as syntactic and semantic parsing in NLP
Information desired for TTS from parsing analysis:
o Word part-of-speech (POS) or word type
o Word sense
o Phrasal cohesion of words: idiom, syntactic phrases, clauses, sentences
o Modification relations among words
o Anaphora (co-reference) and synonymy
o Syntactic type identification: questions, quotes, etc.
o Semantic focus identification (emphasis)
o Semantic type and speech act identification: requesting, informing, narrating,
etc.
o Genre and style analysis
In principal we don’t need all for these for a TTS, but we need those that can provide TTS-specific functionality
LA supports the phonetic analysis and prosodic generation phases what is needed for TTS - sentence breaking/tokenizer - POS tagging - homograph disambiguation - noun phrase and clause detection - sentence type identification