13. Maintaining ML Solutions Flashcards
What are the steps in ML?
Data:
Extraction (from sources)
Analysis (EDA)
Preparation (transform and feature engineering)
Model:
Training (get the best model)
Evaluation (assess the model quality)
Validation (meet a predefined performance metrics)
Deployment (online & batch):
Serving (RESTful endpoint)
Monitor (Detect anomalies, drift & skew)
Hints:
Data: Elephants Are Playful
Model: Tigers Enjoy Vegetation During Sunny Mornings
What are the three levels of MLOps?
Level 0: Manual Phase
Level 1: Strategic automation phase
Level 2: CI/CD automation, transformational phase
What are the key features of Level 0?
Manual
ML and MLOps are different teams
No CI/CD/CT
No deploying an entire ML system
What are the key features of Level 1?
Orchestrated experimentation
CT
Experiment-operational symmetry
Modular components
CD
Pipeline deployment
What are the considerations for triggering retraining?
Training costs
Training time
Delayed training
Scheduled training
What are the key features of Level 2?
Pipeline
CI/CD
What are the triggers for retraining?
Absolute threshold
Rate of degradation
What are the problems for not having a centralised feature store?
Non-reusable: Features created not shared
Governance: Features created by different sources not governed
Cross-collaboration: Features not being shared continue to go separately.
Training and serving differences: Differences may exist between training and serving data.
Productizing features: Lack of automation in features used in experimentation.
What is model versioning for?
Deploy an additional model to the existing model.
What are the two key features of Feature Store?
Process large feature sets quickly
Access the features with low latency for real-time and batch predictions.
Is Vertex AI Feature Store a managed service and scale dynamically?
Yes
What model does Feature Store use to store all the data?
Time-series
What is the hierarchy of featurestore?
Featurestore > EntityType > Feature
What are the two types of ingestions supported by Feature Store?
Batch and streaming ingestion, e.g., BigQuery to Feature Store.
What are the two types of retrieving supported by Feature Store?
Batch and online.
What are the best practices to use IAM security?
Least privilege
Actively manage service accounts and service account keys
Enable auditing
Check policy management
What service do you use to manage permissions to perform various operations?
Identity and Access Management (IAM)
What is the specific uses of IAM in Vertex AI?
Google automatically creates several service accounts for Google Cloud Projects. They may have more permissions than required. Use custom service accounts.
What is Access Transparency in Vertex AI?
You need logs to track what content and who is accessing it. They may be legal and compliance requirements.
There are two types of access logs. Cloud Audit logs are logs of users from your organisation and Access Transparency logs are logs of Google personnel.
What are the common training errors?
Input data not transformed or encoded
Tensor shape mismatched
Out of memory errors because of instance size
What are the common serving errors?
Input data not transformed or encoded
Signature mismatched
What are the ways to prevent and reduce training and serving errors?
Compute statistics
Infer schema
Detect anomalies
What does Vertex AI provide to debug training for both pre-built and custom containers?
Interactive shell
What can you inspect with interactive shell during training?
Run tracing and profiling tools
Analyze GPU utilization
Validate IAM permissions for the container