W1-Introduction to Model Serving Flashcards
When serving an ML model in a production environment, you should consider three key components. What are they?
Introduction to Model Serving
- the model
- an interpreter for the execution
- input data
What is Docker?
Ungraded Lab - Introduction to Docker
Docker is an amazing tool that allows you to ship your software along with all of its dependencies. This is great because it enables you to run software even without installing the required interpreters or compilers for it to run.
Suppose you trained a Deep Learning model using Python along with some libraries such as Tensorflow or JAX. For this you created a virtual environment in your local machine. Everything works fine but now you want to share this model with a colleague of yours who does not have Python installed, much less any of the required libraries.
In a pre-Docker world your colleague would have to install all of this software just to run your model. Instead by installing Docker you can share a Docker image that includes all of your software and that will be all that is needed.
What are Dockerfile, DockerImage and DockerContainer?
Ungraded Lab - Introduction to Docker
Dockerfile: This is a special file that contains all of the instructions required to build an image. These instructions can be anything from “install Python version 3.7” to “copy my code inside the image”.
Image: This refers to the collection of all your software in one single place. Using the previous example, the image will include Python, Tensorflow, JAX and your code. This will be achieved by setting the appropriate instructions within the Dockerfile.
Container: This is a running instance of an image. Images by themselves don’t do much aside from saving the information of your code and its dependencies. You need to run a container out of them to actually run the code within. Containers are usually meant to perform a single task but they can be used as runtimes to run software that you haven’t installed.
Machine learning deployment is usually done in Linux machines on the cloud so it is good that you get to know this OS for the purpose of deploying your models. True/False
Ungraded Lab - Introduction to Docker
True
There’s a trade-off between the model’s predictive effectiveness and the speed of its prediction latency. Depending on the use case, you need to decide on two metrics. There’s the model’s ____ metric which reflects the model’s predictive effectiveness and this includes things like accuracy, precision, recall, and so on. Good values in these metrics is a strong signal about the quality of your model. And then there’s the model’s ____ metric, and this reflects an operational constraint that the model has to satisfy, such as prediction latency. So for example, you might set a latency threshold to a particular value, such as 200 milliseconds and any model that doesn’t meet this threshold is not going to be accepted.
Introduction to Model Serving Infrastructure-GPT Summary
optimizing
satisficing
Two primary choices for deploying models?
Deployment Options-GPT Summary
Centralized model in a data center
Distributing instances of the model to users for local use.
Deploying complex models to mobile devices can lead to poor user experience due to battery drain or slow processing time.
What is done is such cases?
Deployment Options-GPT Summary
Deploying a model to a server and exposing it through a REST API may be a better option for inference in apps.