MLOps Flashcards

Question 1

Q

In MLFlow, whats the easiest way to track experiments?

Answer

A

mlflow.autolog()

Question 2

Q

What is a feature store?

Answer

A

A feature store is a centralized repository that enables data scientists to find and share features and also ensures that the same code used to compute the feature values is used for model training and inference.

Question 3

Q

What are the four data mesh principles?

Answer

A

federated governance
data self service
treating data like a product
domain first infrastructure

Question 4

Q

What is MLOps?

Answer

A

MLOps is a set of processes and automation to manage models, data and code to meet the two goals of stable performance and long-term efficiency in ML systems.

Question 5

Q

What is the ‘deploy code’ approach vs the ‘deploy model’ approach?

Answer

A

These are two approaches to CI/CD for ML models:

Deploy model: The model artifact is trained in the development environment, tested in staging, then deployed into production

Deploy code: The code to train models is developed in the dev environment, and this code is moved to staging and then production: models will be trained in each environment.
This is useful when access to prod data is not possible in lower environments, but data scientists need visibility into training results from production environment.
This is Databricks’ recommended approach, but is use case specific

Question 6

Q

What is Data Lakehouse architecture?

Answer

A

Unifies the best elements of data lakes and data warehouses — delivering
data management and performance typically found in data warehouses with the low-cost, flexible object
stores offered by data lakes.

Data in the lakehouse are typically organized using a “medallion” architecture of Bronze, Silver and Gold tables of increasing refinement and quality.

Question 7

Q

What are the three flavours of MLFlow?

Answer

A

MLflow is an open source project for managing the end-to-end machine learning lifecycle.

Tracking: track experiments to record and compare parameters, metrics and model artifacts.
Models: store and deploy models from any ML library to a variety of
model serving and inference platforms.
Model Registry: centralized model store for managing models’ full lifecycle stage transitions: from staging to production, with capabilities for versioning and annotating. The registry also provides webhooks for automation and continuous deployment.

Question 8

Q

What is the Databricks feature store?

Answer

A

The Databricks Feature Store is a centralized repository of features. It enables feature sharing and discovery
across an organization and also ensures that the same feature computation code is used for model training and inference.

Question 9

Q

What is MLFlow Model Serving?

Answer

A

MLflow Model Serving allows you to host machine learning models from Model Registry as REST endpoints
that are updated automatically based on the availability of model versions and their stages.

Question 10

Q

What are Databricks workflows and jobs?

Answer

A

Databricks Workflows (Jobs and Delta Live Tables) can execute pipelines in automated, non-interactive
ways.

For ML, Jobs can be used to define pipelines for computing features, training models, or other ML
steps or pipelines.

Question 11

Q

What should ML integration tests cover?

Answer

A

Integration tests should run all pipelines to confirm that they function correctly together

Feature store tests
Model training tests
Model deployment tests
Model inference tests
Model monitoring tests

Question 12

Q

What is the balance for integration testing?

Answer

A

Fidelity of testing against speed and cost.

E.g, when models are
expensive to train, it is common to test model training on small data sets or for fewer iterations to reduce
cost.

When models are deployed behind REST APIs, some high-SLA models may need full-scale load testing, whereas others may be tested with small batch jobs or a few queries to temporary REST endpoints.

Question 13

Q

When should ML models be retrained?

Answer

A

When code or data changes affect upstream featurization or training logic, or when automated retraining is scheduled or triggered

Question 14

Q

In the ‘deploy code’ approach to MLOps, what are the three key stages in the CD pipeline?

Answer

A

Compliance checks - These tests load model from the Model Registry, perform compliance checks (for tags, documentation, etc.), and approve or reject the request based on test results. If compliance checks require human
expertise, this automated step can compute statistics or visualizations for people to review in a manual
approval step at the end of the CD pipeline. If pass, model promoted to staging
Compare staging vs. production - All comparison results are saved to metrics tables in the lakehouse.
Request model transition to production

Question 15

Q

What is a canary deployment?

Answer

A

The goal of a canary deployment is to minimize risk and ensure the stability of a new software version by gradually rolling it out to a subset of users or systems before making it available to the entire user base.

The “canary,” is initially deployed to a small, representative group of users or a specific subset of infrastructure. This group is typically selected based on certain criteria, such as a specific region, a particular user segment, or a designated set of servers.

Question 16

Q

What is online model serving?

Answer

Study These Flashcards

A

REST APIs

Lower throughput and lower latency use cases, online serving is generally necessary.

The serving system loads the production model from the Model Registry upon initialization. On
each request, it fetches features from an online Feature Store, scores the data and returns predictions. The serving system, data transport layer or the model itself could log requests and predictions.

With MLflow, it is
simple to deploy models to Databricks Model Serving, cloud provider serving endpoints, or on-prem or
custom serving layers.

Question 17

Q

What are the key aspects of ML monitoring?

Answer

Study These Flashcards

A

Data ingestion: reads in logs from batch, streaming or online inference.
Data accuracy and drift: computes metrics about the input data, the model’s predictions and the infrastructure performance.
Publish metrics: writes to Lakehouse tables for analysis and reporting: monitoring dashboards, allowing for health checks and diagnostics and issues notifications when health metrics surpass defined thresholds.
Trigger model training

MLOps Flashcards

(17 cards)