[FAQs] AI Services Flashcards
In what modes can Amazon Translate receive text?
Either real-time or batch
Can Amazon Translate be used if you don’t know what language the text is in?
Yes - it will use Comprehend behind-the-scenes to determine the source language
How is Amazon Translate used to perform batch translation?
Asynchronously with an API call that points to an S3 bucket folder with up to 1 million documents (up to 5 GB)
In what modes can Rekognition receive data?
- Images can be provided either as bytes in the API call or as an S3 path
- Videos can be stored or streaming
What tasks can Rekognition perform
- Detect objects and scenes
- Detect and analyse faces
- Recognise celebrities
- Identify inappropriate content
- Match faces
- Custom labelling (images only)
- Text detection
What video formats does Rekognition support?
MOV and MPEG-4 encoded with the H.264 codec
What is the maximum runtime for stored videos used with Rekognition?
They can be up to 2 hours long
How does image resolution affect Rekognition results?
While it accepts images that are at least 80 pixels in both dimensions, a VGA (640x480) or higher resolution is recommended
How big should objects be for Rekognition to reliably identify them?
As a rule of thumb, at least 5% of the image size
How can Rekognition results be reviewed by a human?
Using Amazon Augmented AI
How is Rekognition used with Amazon Augmented AI?
Results below a threshold, or as part of a random sample, can be sent for human review
Besides resolution, what can affect Rekognition results?
Heavy blur and lighting etc,
What kinds of labels does Rekognition use when classifying?
- Objects e.g. person
- Scenes e.g. beach
- Concepts e.g. outdoors
It uses a hierarchical system so parent labels are provided if they exist
How is object and scene detection different with Rekognition?
It uses multiple frames to better understand motion etc.
How can you tell if the Rekognition model has been updated?
Every API call returns a [*]ModelVersion based on the kind of model e.g. LabelModelVersion
What information does Rekognition provide about a detected face?
Its pose, gender, age, emotions and facial landmarks etc.
What kinds of facial recognition does Rekognition support?
- Face comparison - are two people the same?
- True facial recognition based on a face collection
Does Rekognition work for S3 objects stored in other regions?
No
How can custom pronunciations be used with Polly?
Either SSML inline or custom lexicons
How does Polly encode which words are spoken when?
Using Speech Marks, which are delivered as a JSON stream separately from the audio
What information do Speech Marks include?
- Sentences
- Words
- Visemes - the shape of the lips corresponding to that sound
- SSML
How does Polly return the synthesised speech?
Either to an S3 bucket or as a stream
Where can Amazon Lex bots be deployed?
Alexa, Connect, Facebook Messenger, Slack and Twillo SMS etc.
At a high-level, how does Amazon Fraud Detector work?
You upload historical fraud data which is used to train a model. This model is used along with a model based on Amazon’s experiences
How can Amazon Fraud Detector be customised?
You can add basic rules e.g. IF X and Y then A
What is Amazon Personalise used for?
Making recommendations to uses and finding similar items
What formats does Textract support?
PNG, JPEG and PDF
What is Amazon Kendra?
An enterprise search service which can answer questions using ML
What is Amazon Forecast?
A service to make predictions based on time-series data
How can Amazon Forecast be customised?
By configuring the HPO (hyper parameter optimisation) parameters
What are HPO parameters?
Hyper-parameter optimisation parameters
How does Transcribe support receiving data?
Either as files in an S3 bucket or with a bidirectional stream over HTTP2
What are the duration limits for Transcribe?
Up to 4 hours for streaming or batch
How can domain specific language be captured by Transcribe?
By configuring a custom vocabulary using IPA or SoundsLike
What factors are likely to impact the performance of Transcribe?
Background noise, strong accents and switching between multiple languages
How can personal information be removed from transcriptions?
With Automatic Content Redaction, which is only support for batch transcriptions in English
What does Automatic Content Redaction do in Transcribe?
Filters out personally identifiable information (PII)