Developing Machine Learning Solutions Flashcards
ML Lifecycle
Business goal identification
ML problem framing
Data processing (data collection, data preprocessing, and feature engineering)
Model development (training, tuning, and evaluation)
Model deployment (inference and prediction)
Model monitoring
Model retraining
Feature engineering is the process of creating, transforming, extracting, and selecting variables from data.
the process of creating, transforming, extracting, and selecting variables from data.
Model Development
Initially, upon training, the model will not yield the expected results. Therefore, developers will do additional feature engineering and tune the model’s hyperparameters before retraining.
Amazon SageMaker Data Wrangler is a
low-code no-code (LCNC) tool. It provides an end-to-end solution to import, prepare, transform, featurize, and analyze data by using a web interface. Customers can add their own Python scripts and transformations to customize workflows.
For more advanced users and data preparation at scale,
Amazon SageMaker Studio Classic comes with built-in integration of Amazon EMR and AWS Glue interactive sessions to handle large-scale interactive data preparation and machine learning workflows within your SageMaker Studio Classic notebook.
Finally, by using the SageMaker Processing API
Amazon SageMaker Feature Store helps data scientists, machine learning engineers, and general practitioners to
create, share, and manage features for ML development.
Features stored in the store can be retrieved and enriched before being served to the ML models for inference
Customers aiming at a LCNC option can use Amazon SageMaker Canvas.
With SageMaker Canvas, they can use machine learning to generate predictions without needing to write any code.
Amazon SageMaker JumpStart provides
pretrained, open source models that customers can use for a wide range of problem types.
Customers can use Amazon SageMaker Experiments to
experiment with multiple combinations of data, algorithms, and parameters, all while observing the impact of incremental changes on model accuracy.
Amazon SageMaker Automatic Model Tuning
Hyperparameter tuning is a way to find the best version of your models. does that by running many jobs with different hyperparameters in combination and measuring each of them by a metric that you choose.
Amazon SageMaker Model Monitor, customers can
observe the quality of SageMaker ML models in production. They can set up continuous monitoring or on-schedule monitoring. SageMaker Model Monitor helps maintain model quality by detecting violations of user-defined thresholds for data quality, model quality, bias drift, and feature attribution drift.
SageMaker JumpStart provides
pretrained open source models for a range of problem types to help you get started with machine learning. models are ready to deploy or to fine-tune
AutoML is available in SageMaker Canvas. It
simplifies ML development by automating the process of building and deploying machine learning models.
Built-in models available in SageMaker require more
effort and scale if the dataset is large and significant resources are needed to train and deploy the model
If there is no built-in solution that works, try to develop one that uses
pre-made images for machine learning and deep learning frameworks for supported frameworks such as scikit-learn, TensorFlow, PyTorch, MXNet, or Chainer.