Modeling 3 Flashcards
Amazon comprehend
Higher-level AI/ML services beyond SageMaker
it does NLP and Text Analytics
Amazon comprehend input
social media emails web pages documents transcripts medical records (comprehend medical)
Amazon comprehend Extracts?
Entities Phrases Sentiments Language Syntax Topics Document classification
can you train Amazon comprehend on your own data?
yes you can train
and also you can use some of out-of-the-box models
Amazon Translate
use deep learning to translate text
can you define some terminologies for Amazon Translate
yes you can
using CSV or TMX format
it’s appropriate for proper names, brands, names etc.
Amazon Transcribe
Speech to text
Does Amazon Transcribe support streaming audio?
yes it does
HTTP/2 or WebSocket
define the language
- French, English, Spanish only
Amazon Transcribe input
FLAC
MP3
MP4
Wave
does Amazon Transcribe do speaker identification?
yes it does
define how many speakers are in there and it will do the rest
does Amazon Transcribe do channel identification?
yes
i.e. two callers could be transcribed separately
Merging based on timing of utterances
does Amazon Transcribe do custom vocabulary?
yes you can
give it a list
special words, names, acronyms
also can do Vocabulary tables that include sound
Amazon Polly
Neural text-to-speech, many voices & languages supports: - Lexicons - SSML - Speech Marks
Does Amazon Polly handle Lexicons?
yes it does
e.g. W3C map to world wide web consortium
SSML
ssml (speech synthesis markup language)
alternative to plain text
speech synthesis markup language
gives control over emphasis, pronunciation, breathing, whispering, speech rate, pitch, pause
Polly Speechmarks
can encode when sentence/word starts and ends in the audio stream
useful for lip-synching animation
Amazon Rekognition
Computer Vision
Object and scene detection
- can use a collection of known faces
Image moderation Facial Analysis Celebrity recognition Face comparison Text in image Video analysis - object - people - celebrities marked on timeline - people pathing
Amazon Rekognition input
Video
- Kinesis Video Streams (H.264 encoded, 5-30FPS, favor resolution over fps)
Image
- S3
- part of the request
Amazon Forecast
Fully managed
highly accurate forecasting with ML
Amazon Forecast models
ARIMA DeepAR ETS NPTS Prophet
AutoML chooses the best model
Amazon Forecast input
works with any time series
- price
- promotion
- economic performance
- etc
can combine with associated data to find relationships
Amazon Forecast use cases
Inventory planning
Financial planning
Resource Planning
Amazon Forecast is forecasting based on
dataset groups
predictors
forecasts
Amazon Lex
Natural language chatbot engine
bot is built around intents
lambda functions are invoked to fulfill the intent
slots specify extra information needed by the intent
a use case for Lex?
making an amazon alexa
use Transcribe to convert voice to text
use Lex to extract the intents
use polly to return a voice to user
Where to deploy Lex?
AWS mobile SDK
Facebook Messenger
Slack
Twilio
Amazon Personalize
collaborative filtering engine
recommender system
feed in data about a user, in return it gives you what other stuff this user might be interested in
Amazon Textract
Optical Character Recognition (OCR)
supports table, forms, fields
Amazon DeepRacer
Reinforcement learning powered 1/18 scale race car
DeepLens
Deep learning-enabled video camera
integrated with rekognition, SageMaker, Polly, Tensorflow, MXNet and Caffe
DeepLense output
Kinesis Video Streams
Reinforcement learning
learning about an environment and how to navigate in an optimal manner as you encounter different states within that environment
There are:
- Environment: Layout e.g. Board/ maze
- Choices (actions)
- Conditions (states)
- Rewards: values associated with the action from state
- Observation: i.e., surroundings in a maze, state of chess board
keep track of reward or penalty associated with each action given a condition
use those values to inform its future choices
What’s MDP?
Markov Decision Process
mathematical framework for modeling decision making
MDP is a discrete time stochastic control process
Does SageMaker offers reinforcement learning?
yes it does
it uses a deep learning framework with Tensorflow and MXNet
supports Intel Coach and Ray Rllib toolkits
Custom, open-source or commercial environments supported
MATLAB Simulink EnergyPlus RoboSchool PyBullet Amazon Sumerian AWS RoboMaker
Is it possible to distribute training with SageMaker RL?
yes it does
it contribute training and/or environment rollout
Multi-core and multi-instance
Reinforcement Learning Hyperparameters
parameters are abstracted
hyperparameter tuning in SageMaker can then optimize them
Reinforcement Learning instance type
no specific guidance given by aws
it’s deep learning though, GPU might be helpful
supports multiple instances and cores
SageMaker Automatic Model Tuning
define the hyperparameters we care about and their ranges we assume is good to try and the metrics we are optimizing for
e. g.
- learning rate
- batch size
- depth
- etc
How does SageMaker saves time and dollars when it comes to Automatic Model Tuning?
It spins up a “HyperParameter Tuning Job” and train as many as combinations that we allow
potentially a lot if instances are spun up
It also learns as it goes so it doesn’t have to try every possible combination
SageMaker Auto Tuning best practices?
don’t optimize too many hyperparameters at once
limit ranges to as small a range as possible
use logarithmic scales when appropriate
don’t run too many training jobs concurrently
- it limits how well the process can learn as it goes
make sure the training jobs running on multiple instances report the correct objective metric in the end
SageMaker and Apache Spark?
yes
there is a SageMaker library that you can use in a spark driver script
instead of spark mllib implementation, use SageMaker Estimator
e.g. XGBoost, PCA, K-mean
SageMaker Spark integration
connect notebook to a remote EMR running spark
or use Zeppelin
Training df should have:
- features column that is a vector of doubles
- optional labels column of doubles
fit on SageMaker estimator and get a SageMaker model
transform on SageMakerModel to make inferences
work with Spark Pipelines
why to do the SageMaker-Spark integration?
combine pre-processing big data in spark with training and inference in SageMaker
Where does the training code used by SageMaker come from?
Whether it’s your own code, a built-in algorithm from SageMaker, or a model you’ve purchased in the marketplace - all training code deployed to SageMaker training instances come from ECR
Which SageMaker algorithm would be best suited for identifying topics in text documents in an unsupervised setting?
Latent Dirichlet Allocation is a topic modeling technique. Neural Topic Model would also be a correct answer.