Machine Learning General Flashcards
Use Amazon ____________ to accelerate File mode training jobs.
FSx for Lustre
a set of variables that controls how a model is trained.
Hyperparameters
Finds the best version of a model by automating the training job within the limits of the hyperparameters that you specified.
Automatic Model Tuning
The process of using the trained model to make predictions.
Inference
Downloads data into the SageMaker instance volume before model training commences.
File mode
Directly stream data from Amazon S3 into the training algorithm container.
Pipe mode
automates the process of building, tuning, and deploying machine learning models based on a tabular dataset (CSV or Parquet).
SageMaker AutoPilot
a data labeling service that lets you use workforce (human annotators) through your own private annotators, Amazon Mechanical Turk, or third-party services.
SageMaker GroundTruth
a visual data preparation and cleaning tool that allows data scientists and engineers to easily clean and prepare data for machine learning.
SageMaker Data Wrangler
allows you to optimize machine learning models for deployment on edge devices to run faster with no loss in accuracy.
SageMaker Neo
automates the process of hyperparameter tuning based on the algorithm and hyperparameter ranges you specify. This can result in saving a significant amount of time for data scientists and engineers.
SageMaker Automatic Model Tuning
provides real-time insights into the training process of machine learning models, enabling rapid iteration. It allows you to monitor and debug training issues, optimize model performance, and improve overall accuracy by analyzing various model-related metrics, such as weights, gradients, and biases.
Amazon SageMaker Debugger
allows data scientists and engineers to save up to 90% on the cost of training machine learning models by using spare compute capacity.
Managed Spot Training
allows for splitting the data and distributing the workload across multiple instances, improving speed and performance. It supports various distributed training frameworks such as TensorFlow, PyTorch, and MXNet.
Distributed Training