Implementation and Operation Flashcards
ML model deployment sections
- interoperation with containers
- accelerating ML systems using Elastic Inference
- push model to edge using SageMaker Neo
- Security: IAM, VPC, KMS
- Right Instance Type
- A/B test in production environment
Where are the models in SageMaker hosted?
in Docker containers registered in ECS
How to distribute Tensorflow across multiple machines?
using a framework called
Horovod or Parameter Servers
How to make containers compatible with SageMaker?
there is a library for that
run pip install sagemaker-containers
How is the structure of a training container?
/opt/ml
- input
- model
- code
- output
input/config:
hyperparameters.json
resourceConfig.json
input/data:
/
code:
python or any other code that does the training should be here
output:
output goes here
output/failure:
failures goes here
model:
inference codes are here
SageMaker on the edge
SageMaker Neo
ARM, Intel, Nvidia processors
avoid few hundred milliseconds of latency
Codes that Neo compile
Tensorflow MXNet PyTorch ONNX XGBoost
Neo comes with Compiler or Runtime?
comes with both
How to pair Neo with IoT Greengrass?
take a Ne-compiled model and deploy to an https endpoint
- hosted on C5, M5, M4, P3, P2
- should be the same instance type used for compilation
in pairing with IoT Greengrass:
- train model on the cloud
- compile with Neo
- deploy to actual edge devices using IoT Greengrass
Greengrass uses Lambda functions
Security in SageMaker
uses:
- IAM
- MFA
- SSL/TLS
- CloudTrail
- Encryption
be careful with PII
How to keep data protected at rest in SageMaker
KMS
- jobs
- notebooks
anything under
/opt/ml
and /tmp
can be encrypted using KMS
Securing Training Data
- S3 encryptions
- also KMS
Can you encrypt inter-node communications?
yes you can
it can increase the time and dollar with DL
also known as inter-container traffic encryption
VPC with SageMaker
yes possible
also possible to cut internet from notebook
then need to set up vpc endpoint for s3
notebooks, training/inference containers are internet-enabled by default
SageMaker logging and monitoring
CloudWatch can log, monitor and alarm on:
- invocation and latency of endpoints
- health of instance nodes
- Ground Truth (how much active workers are doing?)
CloudTrail records actions from users, roles, and services within SageMaker
- log files delivered to S3 for auditing
SageMaker with spot instances, does that work?
it could be interrupted but you can use S3 checkpointing to pick up where you left off
it also can increase time