Intro to ML Flashcards
POC to Production Gap
Proof-of-concept to production
“ML model code is 5-10% of ML project code
refer to [D. Sculley et all NIPS 2015: Hidden Technical Debt in Machine Learning System] diagram
ML project lifecycle
“SDMD”
scoping (X->Y) -> data -> modeling -> deployment
scoping:
* define project [X->Y]
Data:
* define data and establish baseline
* label and organize data
Modeling
* select and train model
* perform error analysis
Deployment
* deploy in production
* monitor & maintain system
Nuance between research/academia and production team’s refinement to ML model?
- code (algorithm/model)
- hyperparameters
- data
research/academia:
tend to hold data the same
optimize code and hyperparameters
production team:
tend to hold code the same
optimize data and hyperparameters
Edge devices [definition?]
Edge devices are pieces of equipment that serve to transmit data between the local network and the cloud.
They are able to translate between the protocols, or languages, used by local devices into the protocols used by the cloud where the data will be further processed.
MLOps stand for?
an emerging discipline, and comprises a set of tools and principles to support progress through the ML project lifecycle.
Concept drift vs. data drift
data drift
[X changes]
e.g. a politician suddenly becomes famous
concept drift
[X -> Y] mapping changes
e.g. house size doesn’t change, but price change
realtime vs. Batch
speech -> realtime
hospital record from patient -> Batch
cloud vs. Edge/Browser
edge/browser -> good to always have as well, in case internet is not accessible or shut down
checklist of things to consider to create ML software
- realtime or Batcch
- cloud vs Edge/Browser
- computer resources (CPU/GPU/memory)
- Latency, throughput (QPS)
- Logging
- security and privacy
throughout (QPS)
Throughput(QPS) - queries per second: This is the number of requests that are successfully executed/serviced per unit of time. For example, if the throughput is 50/minute, this means that on your server, per minute, 50 requests are executed successfully (accepted, processed and responded properly)
Common ML deployment cases
- New product/capability
- automate/assist with manual task
- replace previous ML system
Key ideas:
* Gradual ramp up with monitoring
* Rollback
rollback
if new model not work, go back to previous-working model
gradual ramp up with monitory
not direct big travel to new model
start from a small traffic and then ramp up
shadow mode (deployment)
ML system shadows the human and runs in parallel.
ML system’s output not used for any decisions during this phase.
canary deployment
- roll out to small fraction (say 5%) of traffic initially
- monitor system and ramp up traffic gradually
origin:
canary in a coal mine
which refers to how coal miners used to use canaries to spot if there’s a gas leak