Path8.Mod1.a - Intro to DevOps Principles for ML Flashcards
Additional module on MLOps: https://learn.microsoft.com/en-us/training/paths/introduction-machine-learn-operations/
Three Processes we combine when discussing MLOps
- ML Workloads: Exploring data analysis, Feature Engineering and Model Training/Tuning
- Dev: Software development; planning (model reqs and metrics), create (model training and scoring script), verify (check code and model quality) and package (staging the solution)
- Ops: Release (deploy to prod), Configure (IaC - Infrastructure as Code) and Monitor (Metrics tracking and performance)
Describe CI/CD w.r.t. MLOps
- Continuous Integration: Create and verify activites; Refactoring notebooks to Python code, Linting, unit testing
- Continuous Delivery: Steps to deploy to staging then prod, automating as much as possible.
It is preferrable that Training Data is not stored in Source Control. Store it in a database or data lake and retrieve it directly from the source via a Data Asset (T/F)
FALSE. Retrieval should be through a Datastore since the Datastore will store all credentials for connecting to the storage source.
Otherwise TRUE you should NOT store your training data in Source Control.
BRP IRA
DevOps tools in Azure DevOps and GitHub and what we get out of them
Azure DevOps:
* Azure Boards - sprint planning and tracking boards
* Azure Repos - code repositories
* Azure Pipelines - CI/CD
GitHub:
* Issues - sprint planning and trackgin boards
* Repos
* Actions - automated workflows
c ad/gh s d p t n
Best Practices for MLOps Repository Structure (what to put in which folders)
.cloud: cloud-specific code (ARM templates, etc)
.ad/.github: Azure DevOps or GitHub artifacts like YAML pipelines to automate workflows.
src: contains any code (Python) used for ML workloads
docs: Markdown files or other documentation describing the project.
pipelines: Azure Machine Learning pipelines definitions.
tests: contains unit and integration tests
notebooks: contains notebooks, mostly used for experimentation.
What to use:
- for building Azure ML Pipeline templates
- for Local Development
Python SDK or YAML + CLI, and VS Code respectively
non func mod unit wkfl
Steps for refactoring Notebooks to production Python code
- Removing all nonessential code
- Refactoring code into functions
- Combining related functions in Python scripts to modules (.py files)
- Creating unit tests for each Python script
- Create a pipeline to group scripts into workflows that can be automated
Three ways to create an Azure ML Pipeline (aka Pipeline Job)
What YAML would go into pipeline-job.yml
- Define steps in a Python script (Job-based classes)
- With Components in Designer
- YAML using CLI v2 commands
For the pipeline-job.yml
approach you’d define:
- type: pipeline
- jobs
for transform-job
and train-job