Scaling prototypes into ML models Flashcards
What percentage of the codebase in production ML systems is typically devoted to ML model code, and why is it relatively small?
ML model code accounts for only about 5% of the overall codebase in production ML systems. This is because:
1) Production systems require extensive components beyond model inference, including data ingestion, preprocessing, serving, monitoring, and maintenance pipelines.
2) Ensuring scalability, fault tolerance, and deployment reliability often involves complex engineering tasks unrelated to the core model.
Outline the steps in the ML workflow from data extraction to production deployment, and identify tools used for each step.
1) Data Extraction: Retrieve data from sources (e.g., CRM systems, streaming sensors).
Tools: BigQuery, Apache Beam.
2) Data Analysis: Perform EDA to identify trends, anomalies, and correlations.
Tools: Pandas, Data Studio, BigQuery ML.
3) Data Preparation: Transform raw data into structured formats and engineer features.
Tools: SQL, BigQuery ML.
4) Model Training: Train models using prepared datasets.
Tools: Vertex AI, TensorFlow, PyTorch.
5) Model Validation: Evaluate models against business metrics and test set performance.
Tools: Vertex AI Pipelines, ML.EVALUATE.
6) Deployment: Deploy the validated model to production for online or batch predictions.
Tools: Vertex AI Endpoints, AI Platform Prediction.
What is the role of data distribution analysis in debugging ML models?
Data distribution analysis helps identify changes in input data that may affect model performance. For example:
1) Detecting Schema Changes: Identifies when categorical features are remapped or missing.
2) Identifying Skew: Flags mismatches between training and serving distributions.
3) Preventing Silent Failures: Recognizes when valid-looking inputs no longer align with model expectations.
Tools like Vertex AI Monitoring automate the detection of such anomalies in production systems.
Explain the difference between static and dynamic training paradigms. Provide examples of suitable use cases for each.
Static Training: Models are trained once using historical data and remain fixed post-deployment.
Use Case: Predicting physical constants or static phenomena, e.g., physics simulations.
Dynamic Training: Models are retrained periodically or continuously with new data.
Use Case: Spam detection, where patterns evolve rapidly over time.
Static is simpler and cost-effective but less adaptive, whereas dynamic handles evolving data at higher operational complexity.
What are the advantages of Vertex AI’s managed Notebooks, and how do they enhance the ML workflow?
Vertex AI’s managed Notebooks offer:
1) Pre-installed Frameworks: TensorFlow, PyTorch, and scikit-learn for immediate experimentation.
2) Customizability: CPU/GPU configurations for specific workloads.
3) Security: Google Cloud authentication ensures safe data and code access.
4) Integration: Seamlessly connects with datasets, training pipelines, and models within Vertex AI.
These features accelerate prototyping and simplify deployment for ML engineers.
What is the purpose of hyperparameter tuning in Vertex AI, and how does it function?
Hyperparameter tuning searches for the optimal configuration of hyperparameters to improve model performance. In Vertex AI:
The system evaluates combinations of hyperparameters across multiple trials.
Optimization algorithms (e.g., Bayesian optimization) guide the search process.
Results are logged, enabling engineers to identify the best-performing configuration.
This ensures models achieve maximum accuracy and efficiency.
Describe the role of a model registry in production ML systems.
A model registry:
1) Tracks Versions: Stores different versions of models, including training metadata and hyperparameters.
2) Facilitates Governance: Logs who trained and deployed models and the datasets used.
3) Supports Audits: Enables traceability for compliance and debugging.
4) Simplifies Reuse: Provides a central repository for reusing validated models across teams.
Vertex AI Model Registry supports efficient management of ML artifacts in production.
Compare static and dynamic serving architectures, including their trade-offs.
Static Serving: Precomputes predictions and stores them in a database.
Pros: Low latency, reduced compute costs.
Cons: High storage requirements, lacks adaptability.
Use Case: Predicting product recommendations for static catalogs.
Dynamic Serving: Computes predictions on demand.
Pros: Scales with dynamic data, no storage overhead.
Cons: Higher latency, compute-intensive.
Use Case: Real-time fraud detection.
What are hybrid serving architectures, and when are they appropriate?
Hybrid architectures combine static caching for frequently requested predictions with dynamic serving for the long tail. They are suitable when:
Data distributions are peaked, with many repetitive queries.
Systems require a balance between storage, latency, and compute efficiency. Example: A voice-to-text system caching common phrases while dynamically processing unique inputs.
What errors does monitoring help detect and how does Vertex AI monitoring help maintain model performance in production?
Monitoring detects:
1) Model Drift: Changes in prediction accuracy over time.
2) Data Drift: Shifts in input data distribution.
3) Traffic Patterns: Abnormalities in requests or latency.
4) Resource Usage: Inefficient allocation of compute or storage resources.
Vertex AI Monitoring automatically triggers alerts and retraining when thresholds are breached, ensuring system reliability.
Discuss the design considerations for building an ML pipeline for traffic prediction.
For a traffic prediction system:
1) Training Architecture: Use dynamic training to adapt to changing traffic patterns and events.
2) Serving Architecture: A hybrid model—cache predictions for busy roads and compute dynamically for less-trafficked areas.
3) Data Sources: Combine sensor data with historical patterns for robust predictions.
Design must address temporal dynamics and scalability.
What is the importance of timestamp alignment in training ML models?
Timestamp alignment ensures:
1) Temporal Consistency: Training data reflects the actual state at the time of observation.
2) Prevention of Data Leakage: Avoids incorporating future information into training.
3) Reproducibility: Enables point-in-time analysis.
Misalignment can lead to flawed models and reduced real-world accuracy.
What are endpoints in Vertex AI, and what are their key features?
Endpoints are RESTful services that host trained models for online or batch predictions. Key features:
1) Multiple Models: Can deploy several models to a single endpoint for traffic splitting.
2) Deployment Flexibility: Allows testing new models alongside live systems.
3) Configuration: Managed via names, regions, and access levels.
Endpoints ensure efficient and scalable inference delivery.
When should AutoML be preferred over custom training?
AutoML is preferred when:
Speed and Simplicity: Rapid prototyping or minimal ML expertise is available.
Dataset Exploration: Evaluating features or suitability before custom development. Custom training is better for complex use cases requiring full control and optimization.
How do you transition a trained model to production using Vertex AI?
1) Model Validation: Ensure quality via evaluation metrics.
2) Registry Registration: Store metadata and lineage in the model registry.
3) Endpoint Deployment: Assign the model to an endpoint for serving.
4) Monitoring: Configure performance tracking and alerts.
This systematic approach guarantees reliability and scalability in production environments.
What are the four common dependencies in ML systems, and why are they prone to change?
1) Upstream Models: May be retrained or updated without notice, altering their output distributions.
2) External Data Sources: Often managed by other teams who may change schemas or formats.
3) Feature-Label Relationships: Can evolve over time as real-world dynamics change.
4) Input Distributions: Subject to shifts due to seasonality, policy changes, or user behaviour.
These dependencies change because they often rely on external factors or dynamic systems.
Why is modular design important in machine learning systems, and how does it differ from monolithic approaches?
Modular design improves maintainability, testability, and reuse by isolating components such as data ingestion, preprocessing, and training.
Modular Systems: Allow engineers to focus on small, independent units.
Monolithic Systems: Are tightly coupled, making debugging and updates complex.
Containers, like Kubernetes, simplify modular designs by abstracting applications and libraries.
Describe a scenario where upstream model changes negatively impact an ML system. How can this be mitigated?
Scenario: An umbrella demand model depends on a weather model trained on incorrect historical data. Fixing the weather model causes the umbrella model to underperform due to unexpected input distribution changes.
Mitigation:
Implement notifications for upstream changes.
Maintain a local version of the upstream model to track updates.
Monitor input distributions for deviations.
How can indiscriminate feature inclusion degrade model performance?
Including features without understanding their relationships can lead to:
Correlated Features: Models may over-rely on non-causal features.
Decorrelation: When a correlated feature loses its relationship to the label, model accuracy drops.
Best Practices: Use leave-one-out evaluations to assess feature importance and include only causally significant features.
What is the difference between interpolation and extrapolation in ML predictions? Why is interpolation more reliable?
Interpolation: Predictions within the range of training data; more reliable as the model has seen similar data.
Extrapolation: Predictions outside the training data range; less accurate as the model generalizes beyond its training.
Example: A model trained on house prices in urban areas interpolates well in cities but extrapolates poorly for rural properties.
What techniques help mitigate the impact of changing data distributions?
Monitoring: Analyze input summaries (mean, variance) for deviations.
Residual Analysis: Track prediction errors across different input segments.
Temporal Weighting: Prioritize recent data using custom loss functions.
Retraining: Regularly update models with new data to adapt to distribution changes.
Explain the concept of data leakage and provide an example.
Definition: Data leakage occurs when information not available during inference influences model training, leading to inflated performance metrics.
Example: A hospital assignment model uses “hospital name” during training, which is unavailable during real-time predictions. This results in degraded performance when deployed.
What are the types of drift in ML systems, and how do they affect models?
1) Data Drift: Change in input feature distributions (e.g., income levels rising).
2) Concept Drift: Shift in feature-label relationships (e.g., income thresholds for loans).
3) Prediction Drift: Change in output distributions, possibly due to business changes.
4) Label Drift: Shift in label distributions over time.
Each drift reduces model accuracy and necessitates monitoring and retraining.
How can concept drift manifest in e-commerce recommendation systems? and how do you mitigate against this?
In e-commerce:
Concept Drift: Customer preferences change over time due to trends or seasonality.
Impact: Static models recommend outdated products, reducing engagement.
Solution: Periodically retrain models on the latest user interactions and purchasing data.