Amazon Polly | General Flashcards by Parri Pandian

What is Amazon Polly?

General

Amazon Polly | Machine Learning

Amazon Polly is a service that turns text into lifelike speech. Amazon Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products, from mobile apps and cars, to devices and appliances. Amazon Polly includes dozens of lifelike voices and support for multiple languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Amazon Polly is easy to use – you just send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Amazon Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Amazon Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Amazon Polly’s generated speech at no additional cost. Amazon Polly lets you convert 5M characters per month for free during the first year, upon sign-up. Amazon Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere.

How well did you know this?

Not at all

Perfectly

Why should I use Amazon Polly?

General

Amazon Polly | Machine Learning

You can use Amazon Polly to power your application with high-quality spoken output. This cost-effective service has very low response times, and is available for virtually any use case, with no restrictions on storing and reusing generated speech.

How well did you know this?

Not at all

Perfectly

What features are available?

General

Amazon Polly | Machine Learning

You can control various aspects of speech such as pronunciation, volume, pitch, speech rate, etc. using standardized Speech Synthesis Markup Language (SSML). You can detect when specific words or sentences in the text are being spoken to the user based on the metadata included in the audio stream. This allows the developer to synchronize graphical highlighting and animations, such as the lip movements of an avatar, with the synthesized speech. You can modify the pronunciation of particular words, such as company names, acronyms, foreign words and neologisms, e.g. “P!nk”, “ROTFL”, “C’est la vie” (when spoken in a non-French voice) using custom lexicons.

How well did you know this?

Not at all

Perfectly

What are Speech Marks?

General

Amazon Polly | Machine Learning

Speech Marks are designed to complement the synthesized speech that is generated from the input text. Using this metadata alongside the synthesized speech audio stream, customers can provide their application with an enhanced visual experience such as speech-synchronized animation or karaoke-style highlighting.

Amazon Polly generates Speech Marks using the following four elements:

Sentence, which indicates a sentence element in the input text to be spoken;

Word, which Indicates a word element in the text;

Viseme, which describes the shape of the lips that corresponds to the sound that is spoken;

SSML, which describes an SSML element used in the text.

Speech Marks are delivered in form of a JSON stream – specifically, a set of standalone JSON objects delimited with new lines – containing anywhere from one to all four of these elements, when using the synthesize-speech method with the speech-mark-types parameter. You can find more information in the Amazon Polly Developer Guide.

How well did you know this?

Not at all

Perfectly

What are the most common use cases for this service?

General

Amazon Polly | Machine Learning

With Amazon Polly, you can bring your applications to life, by adding life-like speech capabilities. For example, in E-learning and education, you can build applications leveraging Amazon Polly’s Text-to-Speech (TTS) capability to help people with reading disabilities. Amazon Polly can be used to help the blind and visually impaired consume digital content (eBooks, news etc). Amazon Polly can be used in announcement systems in public transportation and industrial control systems for notifications and emergency announcements. There are a wide range of devices such as set-top boxes, smart watches, tablets, smartphones and IoT devices, which can leverage Amazon Polly for providing audio output. Amazon Polly can be used in telephony solutions to voice Interactive Voice Response systems. Applications such as quiz games, animations, avatars or narration generation are common use-cases for cloud-based TTS solution like Amazon Polly.

How well did you know this?

Not at all

Perfectly

How does this product work with other AWS products?

General

Amazon Polly | Machine Learning

When combined with Amazon Lex, developers can create full-blown Voice User Interfaces for their applications. Within Amazon Connect, Amazon Polly speech is used to create self-service , cloud-based contact center services. On top of that, developers of mobile applications and Internet-of-Things (IoT) solutions can leverage Amazon Polly to add spoken output to their own systems.

How well did you know this?

Not at all

Perfectly

What are the advantages of a cloud-based Text-to-Speech solution over an on-device one?

General

Amazon Polly | Machine Learning

On-device text-to-speech solutions require significant computing resources, notably CPU power, RAM, and disk space to be available on the device. This can result in higher development cost and higher power consumption on devices such as tablets, smartphones, etc. In contrast, text-to-speech conversion done in the cloud dramatically reduces local resource requirements. This makes it possible to support all of the available languages and voices at the highest possible quality. Moreover, speech corrections and enhancements are instantly available to all end-users and do not require additional updates for all devices. Cloud-based text-to-speech (TTS) is platform independent, so it minimizes development time and effort.

How well did you know this?

Not at all

Perfectly

How do I get started with Amazon Polly?

General

Amazon Polly | Machine Learning

Simply login to your AWS account and navigate to the Amazon Polly console (which is a part of the AWS Console). You can then use the console to type in any text and listen to generated speech or save it as an audio file.

How well did you know this?

Not at all

Perfectly

In which regions is the service available?

General

Amazon Polly | Machine Learning

Please refer to the AWS Global Infrastructure Region Table.

How well did you know this?

Not at all

Perfectly

Which programming languages are supported?

General

Amazon Polly | Machine Learning

Amazon Polly supports all the programming languages included in the AWS SDK (Java, Node.js, .NET, PHP, Python, Ruby, Go, and C++) and AWS Mobile SDK (iOS/Android). Amazon Polly also supports an HTTP API so you can implement your own access layer.

How well did you know this?

Not at all

Perfectly

Which audio formats are supported?

General

Amazon Polly | Machine Learning

With Amazon Polly, you can stream audio to your users in near real time. You can also choose from various sampling rates to optimize bandwidth and audio quality for your application. Amazon Polly supports MP3, Vorbis, and raw PCM audio stream formats.

How well did you know this?

Not at all

Perfectly

What languages are supported?

General

Amazon Polly | Machine Learning

Please refer to documentation for the complete list of languages supported by Amazon Polly.

How well did you know this?

Not at all

Perfectly