[4] SageMaker Flashcards
How can models being built using SageMaker Notebooks be rapidly iterated?
Using SageMaker Local Mode to train models from the notebook, preventing the overhead from provisioning infrastructure and moving data
What is Amazon Ground Truth?
A service that uses humans (either in-house, specialists or Mechanical Turk) to label data and train the model
It uses this model to automatically label ‘easy’ cases, reducing training cost
What are .lst files?
Space separated files used to list data, such as images and their labels
Where can SageMaker algorithms be sourced?
They can be custom, from the Marketplace or provided
What the the main built-in SageMaker Algorithms?
- BlazingText - word2vec text classification for NLP and sentiment analysis etc.
- Image Classification Algorithm - general purpose CNN
K-Means - optimised for ‘web scale’
Latent Dirichlet Allocation (LDA) - perform text analysis and topic discovery
XGBoost - gradient boosted trees algorithm; used on tabular datasets
Where the do the assets for custom SageMaker Algorithms exist?
The code is hosted on ECR and the model itself etc. is on S3
Can you view the code for SageMaker Algorithms from the Marketplace?
No
What services does SageMaker support as data sources?
S3, EFS and FSx for Lustre
How are parts of the data in the dataset (i.e. train vs validation) managed?
With channels
How should failed training jobs be debugged?
- look at the CloudWatch Logs
- use the DescribeTrainingJob API and check the FailureReason
However, don’t use the SageMaker Console
What are the general types of hyper parameters?
- Model hyper-parameters - how the model is structure e.g. filter size
- Optimiser hyper-parameters - how the model is trained e.g. step size
- Data hyper-parameters - modify the data itself e.g. data augmentation
What technique does SageMaker Automatic Model Tuning use?
Bayesian optimisation
What are the general steps for performing hyper parameter tuning?
- (decide on the model to use)
- Set the ranges of the hyper parameters e.g. max depth from 3 to 9
- Choose the metric to maximise e.g. AUC
Which SageMaker tool is used for hyper parameter optimisation?
SageMaker Automatic Model Tuning
What are the steps to hosting a model with SageMaker?
- Create a model in SageMaker - specific the S3 and ECR paths
- Create an endpoint configuration based on the model from (2) and the number of instances etc.
- Create an HTTPS endpoint using the configuration from (3)