[4] SageMaker Flashcards

1
Q

How can models being built using SageMaker Notebooks be rapidly iterated?

A

Using SageMaker Local Mode to train models from the notebook, preventing the overhead from provisioning infrastructure and moving data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Amazon Ground Truth?

A

A service that uses humans (either in-house, specialists or Mechanical Turk) to label data and train the model

It uses this model to automatically label ‘easy’ cases, reducing training cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are .lst files?

A

Space separated files used to list data, such as images and their labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where can SageMaker algorithms be sourced?

A

They can be custom, from the Marketplace or provided

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What the the main built-in SageMaker Algorithms?

A
  • BlazingText - word2vec text classification for NLP and sentiment analysis etc.
  • Image Classification Algorithm - general purpose CNN

K-Means - optimised for ‘web scale’

Latent Dirichlet Allocation (LDA) - perform text analysis and topic discovery

XGBoost - gradient boosted trees algorithm; used on tabular datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where the do the assets for custom SageMaker Algorithms exist?

A

The code is hosted on ECR and the model itself etc. is on S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can you view the code for SageMaker Algorithms from the Marketplace?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What services does SageMaker support as data sources?

A

S3, EFS and FSx for Lustre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are parts of the data in the dataset (i.e. train vs validation) managed?

A

With channels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How should failed training jobs be debugged?

A
  • look at the CloudWatch Logs
  • use the DescribeTrainingJob API and check the FailureReason

However, don’t use the SageMaker Console

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the general types of hyper parameters?

A
  • Model hyper-parameters - how the model is structure e.g. filter size
  • Optimiser hyper-parameters - how the model is trained e.g. step size
  • Data hyper-parameters - modify the data itself e.g. data augmentation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What technique does SageMaker Automatic Model Tuning use?

A

Bayesian optimisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the general steps for performing hyper parameter tuning?

A
  • (decide on the model to use)
  • Set the ranges of the hyper parameters e.g. max depth from 3 to 9
  • Choose the metric to maximise e.g. AUC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which SageMaker tool is used for hyper parameter optimisation?

A

SageMaker Automatic Model Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the steps to hosting a model with SageMaker?

A
  • Create a model in SageMaker - specific the S3 and ECR paths
  • Create an endpoint configuration based on the model from (2) and the number of instances etc.
  • Create an HTTPS endpoint using the configuration from (3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some key considerations when managing SageMaker deployments?

A
  • Decouple the ETL and ML pipelines as the former is IO intensive while the later needs GPU
  • Endpoints support auto-scaling and deployments across AZs (for HA)
  • A single endpoint can’t serve multiple models - use a Lambda function to perform an ensemble etc.
17
Q

What are the key considerations when securing SageMaker Notebooks?

A
  • Restrict the sagemaker:CreatePresignedNotebookInstanceUrl IAM permission
  • Restrict root access to notebooks
  • Narrow the scope of instance profiles attached to notebook instances
18
Q

Can you lock down access per SageMaker Notebook using IAM?

A

No

19
Q

What are the key considerations when securing SageMaker models?

A
  • Models are hosted in a public VPC by default, but a private one can be configured
  • Data and models are stored in S3 - encrypt this and restrict access to a trusted VPC endpoint