{"id":2969,"date":"2025-04-09T07:02:50","date_gmt":"2025-04-09T07:02:50","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/04\/09\/a-data-scientists-guide-to-docker-containers\/"},"modified":"2025-04-09T07:02:50","modified_gmt":"2025-04-09T07:02:50","slug":"a-data-scientists-guide-to-docker-containers","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/04\/09\/a-data-scientists-guide-to-docker-containers\/","title":{"rendered":"A Data Scientist\u2019s Guide to Docker Containers"},"content":{"rendered":"<p>    A Data Scientist\u2019s Guide to Docker Containers<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1744091632793\" class=\"mdspan-comment\">For <\/mdspan>a ML <mdspan datatext=\"el1744142294300\" class=\"mdspan-comment\">model<\/mdspan> to be useful it needs to run somewhere. This somewhere is most likely not your local machine. A not-so-good model that runs in a production environment is better than a perfect model that never leaves your local machine.<\/p>\n<p class=\"wp-block-paragraph\">However, the production machine is usually different from the one you developed the model on. So, you ship the model to the production machine, but somehow the model doesn\u2019t work anymore. That\u2019s weird, right? You tested everything on your local machine and it worked fine. You even wrote unit tests.<\/p>\n<p class=\"wp-block-paragraph\">What happened? Most likely the production machine differs from your local machine. Perhaps it does not have all the needed dependencies installed to run your model. Perhaps installed dependencies are on a different version. There can be many reasons for this.<\/p>\n<p class=\"wp-block-paragraph\">How can you solve this problem? One approach could be to exactly replicate the production machine. But that is very inflexible as for each new production machine you would need to build a local replica.<\/p>\n<p class=\"wp-block-paragraph\">A much nicer approach is to use <a href=\"https:\/\/towardsdatascience.com\/tag\/docker\/\" title=\"Docker\">Docker<\/a> containers.<\/p>\n<p class=\"wp-block-paragraph\">Docker is a tool that helps us to create, manage, and run code and applications in containers. A container is a small isolated computing environment in which we can package an application with all its dependencies. In our case our ML model with all the libraries it needs to run. With this, we do not need to rely on what is installed on the host machine. A <a href=\"https:\/\/towardsdatascience.com\/tag\/docker-container\/\" title=\"Docker Container\">Docker Container<\/a> enables us to separate applications from the underlying infrastructure.<\/p>\n<p class=\"wp-block-paragraph\">For example, we package our ML model locally and push it to the cloud. With this, Docker helps us to ensure that our model can run anywhere and anytime. Using Docker has several advantages for us. It helps us to deliver new models faster, improve reproducibility, and make collaboration easier. All because we have exactly the same dependencies no matter where we run the container.<\/p>\n<p class=\"wp-block-paragraph\">As Docker is widely used in the industry Data Scientists need to be able to build and run containers using Docker. Hence, in this article, I will go through the basic concept of containers. I will show you all you need to know about Docker to get started. After we have covered the theory, I will show you how you can build and run your own Docker container.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">What is a container?<\/h2>\n<p class=\"wp-block-paragraph\">A container is a small, isolated environment in which everything is self-contained. The environment packages up all code and dependencies.<\/p>\n<p class=\"wp-block-paragraph\">A container has five main features.<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>self-contained<\/strong>: A container isolates the application\/software, from its environment\/infrastructure. Due to this isolation, we do not need to rely on any pre-installed dependencies on the host machine. Everything we need is part of the container. This ensures that the application can always run regardless of the infrastructure.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>isolated<\/strong>: The container has a minimal influence on the host and other containers and vice versa.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>independent<\/strong>: We can manage containers independently. Deleting a container does not affect other containers.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>portable<\/strong>: As a container isolates the software from the hardware, we can run it seamlessly on any machine. With this, we can move it between machines without a problem.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>lightweight<\/strong>: Containers are lightweight as they share the host machine\u2019s OS. As they do not require their own OS, we do not need to partition the hardware resource of the host machine.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">This might sound similar to virtual machines. But there is one big difference. The difference is in how they use their host computer\u2019s resources. Virtual machines are an abstraction of the physical hardware. They partition one server into multiple. Thus, a VM includes a full copy of the OS which takes up more space.<\/p>\n<p class=\"wp-block-paragraph\">In contrast, containers are an abstraction at the application layer. All containers share the host\u2019s OS but run in isolated processes. Because containers do not contain an OS, they are more efficient in using the underlying system and resources by reducing overhead.<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/04\/1FcnhsJhCJs6vxJaca6HDOw.png?ssl=1\" alt=\"\" class=\"wp-image-601181\"><figcaption class=\"wp-element-caption\">Containers vs. Virtual Machines (Image by the author based on <a href=\"https:\/\/www.docker.com\/resources\/what-container\/\" rel=\"noreferrer noopener\" target=\"_blank\">docker.com<\/a>)<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Now we know what containers are. Let\u2019s get some high-level understanding of how Docker works. I will briefly introduce the technical terms that are used often.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">What is\u00a0Docker?<\/h2>\n<p class=\"wp-block-paragraph\">To understand how Docker works, let\u2019s have a brief look at its architecture.<\/p>\n<p class=\"wp-block-paragraph\">Docker uses a client-server architecture containing three main parts: A Docker client, a Docker daemon (server), and a Docker registry.<\/p>\n<p class=\"wp-block-paragraph\">The Docker client is the primary way to interact with Docker through commands. We use the client to communicate through a REST API with as many Docker daemons as we want. Often used commands are docker run, docker build, docker pull, and docker push. I will explain later what they do.<\/p>\n<p class=\"wp-block-paragraph\">The Docker daemon manages Docker objects, such as images and containers. The daemon listens for Docker API requests. Depending on the request the daemon builds, runs, and distributes Docker containers. The Docker daemon and client can run on the same or different systems.<\/p>\n<p class=\"wp-block-paragraph\">The Docker registry is a centralized location that stores and manages Docker images. We can use them to share images and make them accessible to others.<\/p>\n<p class=\"wp-block-paragraph\">Sounds a bit abstract? No worries, once we get started it will be more intuitive. But before that, let\u2019s run through the needed steps to create a Docker container.<\/p>\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/04\/1Ue5cZTuqFXIi_tPW57BQTA.png?ssl=1\" alt=\"\" class=\"wp-image-601180\"><figcaption class=\"wp-element-caption\">Docker Architecture (Image by author based on <a href=\"https:\/\/docs.docker.com\/get-started\/docker-overview\/\" rel=\"noreferrer noopener\" target=\"_blank\">docker.com<\/a>)<\/figcaption><\/figure>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">What do we need to create a Docker container?<\/h2>\n<p class=\"wp-block-paragraph\">It is simple. We only need to do three steps:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">create a Dockerfile<\/li>\n<li class=\"wp-block-list-item\">build a Docker Image from the Dockerfile<\/li>\n<li class=\"wp-block-list-item\">run the Docker Image to create a Docker container<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">Let\u2019s go step-by-step.<\/p>\n<p class=\"wp-block-paragraph\">A Dockerfile is a text file that contains instructions on how to build a Docker Image. In the Dockerfile we define what the application looks like and its dependencies. We also state what process should run when launching the Docker container. The Dockerfile is composed of layers, representing a portion of the image\u2019s file system. Each layer either adds, removes, or modifies the layer below it.<\/p>\n<p class=\"wp-block-paragraph\">Based on the Dockerfile we create a Docker Image. The image is a read-only template with instructions to run a Docker container. Images are immutable. Once we create a Docker Image we cannot modify it anymore. If we want to make changes, we can only add changes on top of existing images or create a new image. When we rebuild an image, Docker is clever enough to rebuild only layers that have changed, reducing the build time.<\/p>\n<p class=\"wp-block-paragraph\">A Docker Container is a runnable instance of a Docker Image. The container is defined by the image and any configuration options that we provide when creating or starting the container. When we remove a container all changes to its internal states are also removed if they are not stored in a persistent storage.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">Using Docker: An\u00a0example<\/h2>\n<p class=\"wp-block-paragraph\">With all the theory, let\u2019s get our hands dirty and put everything together.<\/p>\n<p class=\"wp-block-paragraph\">As an example, we will package a simple ML model with Flask in a Docker container. We can then run requests against the container and receive predictions in return. We will train a model locally and only load the artifacts of the trained model in the Docker Container.<\/p>\n<p class=\"wp-block-paragraph\">I will go through the general workflow needed to create and run a Docker container with your ML model. I will guide you through the following steps:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">build model<\/li>\n<li class=\"wp-block-list-item\">create <code>requirements.txt<\/code> file containing all dependencies<\/li>\n<li class=\"wp-block-list-item\">create <code>Dockerfile<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">build docker image<\/li>\n<li class=\"wp-block-list-item\">run container<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">Before we get started, we need to install Docker Desktop. We will use it to view and run our Docker containers later on.\u00a0<\/p>\n<h3 class=\"wp-block-heading\">1. Build a\u00a0model<\/h3>\n<p class=\"wp-block-paragraph\">First, we will train a simple RandomForestClassifier on <code>scikit-learn<\/code>\u2019s Iris dataset and then store the trained model.<\/p>\n<div class=\"wp-block-tds-gist-embed\">\n\t<script src=\"https:\/\/gist.github.com\/joDancker\/b24961b14928d04b786e8d3536498f66.js\"><\/script>\n<\/div>\n<p class=\"wp-block-paragraph\">Second, we build a script making our model available through a Rest API, using Flask. The script is also simple and contains three main steps:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">extract and convert the data we want to pass into the model from the payload JSON<\/li>\n<li class=\"wp-block-list-item\">load the model artifacts and create an onnx session and run the model<\/li>\n<li class=\"wp-block-list-item\">return the model\u2019s predictions as json<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">I took most of the code from <a href=\"https:\/\/onnx.ai\/sklearn-onnx\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> and <a href=\"https:\/\/github.com\/docker\/awesome-compose\/tree\/master\/flask\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> and made only minor changes.<\/p>\n<div class=\"wp-block-tds-gist-embed\">\n\t<script src=\"https:\/\/gist.github.com\/joDancker\/6385e628aacb17f1521b13a9878bd2e6.js\"><\/script>\n<\/div>\n<h3 class=\"wp-block-heading\">2. Create requirements<\/h3>\n<p class=\"wp-block-paragraph\">Once we have created the Python file we want to execute when the Docker container is running, we must create a <code>requirements.txt<\/code> file containing all dependencies. In our case, it looks like this:<\/p>\n<div class=\"wp-block-tds-gist-embed\">\n\t<script src=\"https:\/\/gist.github.com\/joDancker\/733931cbef7537a89b4ceb1e1cbc5449.js\"><\/script>\n<\/div>\n<h3 class=\"wp-block-heading\">3. Create Dockerfile<\/h3>\n<p class=\"wp-block-paragraph\">The last thing we need to prepare before being able to build a Docker Image and run a Docker container is to write a Dockerfile.<\/p>\n<p class=\"wp-block-paragraph\">The Dockerfile contains all the instructions needed to build the Docker Image. The most common instructions are<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>FROM &lt;image&gt;<\/code>\u200a\u2014\u200athis specifies the base image that the build will extend.<\/li>\n<li class=\"wp-block-list-item\">\n<code>WORKDIR &lt;path&gt;<\/code>\u200a\u2014\u200athis instruction specifies the \u201cworking directory\u201d or the path in the image where files will be copied and commands will be executed.<\/li>\n<li class=\"wp-block-list-item\">\n<code>COPY &lt;host-path&gt;&lt;image-path&gt;<\/code>\u200a\u2014\u200athis instruction tells the builder to copy files from the host and put them into the container image.<\/li>\n<li class=\"wp-block-list-item\">\n<code>RUN &lt;command&gt;<\/code>\u200a\u2014\u200athis instruction tells the builder to run the specified command.<\/li>\n<li class=\"wp-block-list-item\">\n<code>ENV &lt;name&gt;&lt;value&gt;<\/code>\u200a\u2014\u200athis instruction sets an environment variable that a running container will use.<\/li>\n<li class=\"wp-block-list-item\">\n<code>EXPOSE &lt;port-number&gt;<\/code>\u200a\u2014\u200athis instruction sets the configuration on the image that indicates a port the image would like to expose.<\/li>\n<li class=\"wp-block-list-item\">\n<code>USER &lt;user-or-uid&gt;<\/code>\u200a\u2014\u200athis instruction sets the default user for all subsequent instructions.<\/li>\n<li class=\"wp-block-list-item\">\n<code>CMD [\"&lt;command&gt;\", \"&lt;arg1&gt;\"]<\/code>\u200a\u2014\u200athis instruction sets the default command a container using this image will run.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">With these, we can create the Dockerfile for our example. We need to follow the following steps:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Determine the base image<\/li>\n<li class=\"wp-block-list-item\">Install application dependencies<\/li>\n<li class=\"wp-block-list-item\">Copy in any relevant source code and\/or binaries<\/li>\n<li class=\"wp-block-list-item\">Configure the final image<\/li>\n<\/ol>\n<div class=\"wp-block-tds-gist-embed\">\n\t<script src=\"https:\/\/gist.github.com\/joDancker\/19261e1edc3654c1c2240b106b147a3d.js\"><\/script>\n<\/div>\n<p class=\"wp-block-paragraph\">Let\u2019s go through them step by step. Each of these steps results in a layer in the Docker Image.<\/p>\n<p class=\"wp-block-paragraph\">First, we specify the base image that we then build upon. As we have written in the example in Python, we will use a Python base image.<\/p>\n<p class=\"wp-block-paragraph\">Second, we set the working directory into which we will copy all the files we need to be able to run our ML model.<\/p>\n<p class=\"wp-block-paragraph\">Third, we refresh the package index files to ensure that we have the latest available information about packages and their versions.<\/p>\n<p class=\"wp-block-paragraph\">Fourth, we copy in and install the application dependencies.<\/p>\n<p class=\"wp-block-paragraph\">Fifth, we copy in the source code and all other files we need. Here, we also expose port 8080, which we will use for interacting with the ML model.<\/p>\n<p class=\"wp-block-paragraph\">Sixth, we set a user, so that the container does not run as the root user<\/p>\n<p class=\"wp-block-paragraph\">Seventh, we define that the <code>example.py<\/code> file will be executed when we run the Docker container. With this, we create the Flask server to run our requests against.<\/p>\n<p class=\"wp-block-paragraph\">Besides creating the Dockerfile, we can also create a\u00a0<code>.dockerignore<\/code> file to improve the build speed. Similar to a\u00a0<code>.gitignore<\/code> file, we can exclude directories from the build context.<\/p>\n<p class=\"wp-block-paragraph\">If you want to know more, please go to <a href=\"https:\/\/docs.docker.com\/get-started\/docker-concepts\/building-images\/writing-a-dockerfile\/\" rel=\"noreferrer noopener\" target=\"_blank\">docker.com<\/a>.<\/p>\n<h3 class=\"wp-block-heading\">4. Create Docker\u00a0Image<\/h3>\n<p class=\"wp-block-paragraph\">After we created all the files we needed to build the Docker Image.<\/p>\n<p class=\"wp-block-paragraph\">To build the image we first need to open Docker Desktop. You can check if Docker Desktop is running by running <code>docker ps<\/code> in the command line. This command shows you all running containers.<\/p>\n<p class=\"wp-block-paragraph\">To build a Docker Image, we need to be at the same level as our Dockerfile and <code>requirements.txt<\/code> file. We can then run <code>docker build -t our_first_image\u00a0.<\/code> The <code>-t<\/code> flag indicates the name of the image, i.e., <code>our_first_image<\/code>, and the\u00a0<code>.<\/code> tells us to build from this current directory.<\/p>\n<p class=\"wp-block-paragraph\">Once we built the image we can do several things. We can<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">view the image by running <code>docker image ls<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">view the history or how the image was created by running <code>docker image history &lt;image_name&gt;<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">push the image to a registry by running <code>docker push &lt;image_name&gt;<\/code>\n<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">5. Run Docker Container<\/h3>\n<p class=\"wp-block-paragraph\">Once we have built the Docker Image, we can run our ML model in a container.<\/p>\n<p class=\"wp-block-paragraph\">For this, we only need to execute <code>docker run -p 8080:8080 &lt;image_name&gt;<\/code> in the command line. With <code>-p 8080:8080<\/code> we connect the local port (8080) with the port in the container (8080).<\/p>\n<p class=\"wp-block-paragraph\">If the Docker Image doesn\u2019t expose a port, we could simply run <code>docker run &lt;image_name&gt;<\/code>. Instead of using the <code>image_name<\/code>, we can also use the <code>image_id<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">Okay, once the container is running, let\u2019s run a request against it. For this, we will send a payload to the endpoint by running <code>curl <\/code> <code>X POST http:\/\/localhost:8080\/invocations -H \"Content-Type:application\/json\" -d @.path\/to\/sample_payload.json<\/code><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">In this article, I showed you the basics of Docker Containers, what they are, and how to build them yourself. Although I only scratched the surface it should be enough to get you started and be able to package your next model. With this knowledge, you should be able to avoid the \u201cit works on my machine\u201d problems.<\/p>\n<p class=\"wp-block-paragraph\">I hope that you find this article useful and that it will help you become a better Data Scientist.<\/p>\n<p class=\"wp-block-paragraph\">See you in my  next article and\/or leave a comment.<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/a-data-scientists-guide-to-docker-containers\/\">A Data Scientist\u2019s Guide to Docker Containers<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Jonte Dancker<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/a-data-scientists-guide-to-docker-containers\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Data Scientist\u2019s Guide to Docker Containers For a ML model to be useful it needs to run somewhere. This somewhere is most likely not your local machine. A not-so-good model that runs in a production environment is better than a perfect model that never leaves your local machine. However, the production machine is usually [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,69,83,1082,2323,70],"tags":[2324,341,103],"class_list":["post-2969","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-artificial-intelligence","category-data-science","category-docker","category-docker-container","category-machine-learning","tag-docker","tag-machine","tag-model"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2969"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2969"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2969\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2969"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2969"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2969"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}