{"id":1955,"date":"2025-02-20T07:02:52","date_gmt":"2025-02-20T07:02:52","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/20\/why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge\/"},"modified":"2025-02-20T07:02:52","modified_gmt":"2025-02-20T07:02:52","slug":"why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/20\/why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge\/","title":{"rendered":"Why Data Scientists Should Care about Containers \u2014 and Stand Out with This Knowledge"},"content":{"rendered":"<p>    Why Data Scientists Should Care about Containers \u2014 and Stand Out with This Knowledge<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"0cd7\">\u201cI train models, analyze data and create dashboards \u2014 why should I care about <a href=\"https:\/\/towardsdatascience.com\/tag\/containers\/\" title=\"Containers\">Containers<\/a>?\u201d<\/p>\n<p class=\"wp-block-paragraph\" id=\"593b\">Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on your laptop. However, error messages keep popping up in the cloud when others access it \u2014 for example because they are using different library versions.<\/p>\n<p class=\"wp-block-paragraph\" id=\"bb0e\">This is where containers come into play: They allow us to make machine learning models, data pipelines and development environments stable, portable and scalable\u200a\u2014\u200aregardless of where they are executed.<\/p>\n<p class=\"wp-block-paragraph\" id=\"22e8\">Let\u2019s take a closer look.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"d521\"><strong>Table of Content<\/strong>s<br \/><a href=\"https:\/\/towardsdatascience.com\/#1.0\">1 \u2014 Containers vs. Virtual Machines: Why containers are more flexible than VMs<\/a><br \/><a href=\"https:\/\/towardsdatascience.com\/#2.0\">2 \u2014 Containers &amp; Data Science: Do I really need Containers? And 4 reasons why the answer is yes.<\/a><br \/><a href=\"https:\/\/towardsdatascience.com\/#3.0\">3 \u2014 First Practice, then Theory: Container creation even without much prior knowledge<\/a><br \/><a href=\"https:\/\/towardsdatascience.com\/#4.0\">4 \u2014 Your 101 Cheatsheet: The most important Docker commands &amp; concepts at a glance<\/a><br \/><a href=\"https:\/\/towardsdatascience.com\/#final\">Final Thoughts: Key takeaways as a data scientist<\/a><br \/><a href=\"https:\/\/towardsdatascience.com\/#where\">Where Can You Continue Learning?<\/a><\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\" id=\"1.0\">1 \u2014 Containers vs. Virtual Machines: Why containers are more flexible than VMs<\/h2>\n<p class=\"wp-block-paragraph\" id=\"861b\">Containers are lightweight, isolated environments. They contain applications with all their dependencies. They also share the kernel of the host operating system, making them fast, portable and resource-efficient.<\/p>\n<p class=\"wp-block-paragraph\" id=\"45b6\">I have written extensively about virtual machines (VMs) and virtualization in \u2018<a href=\"https:\/\/towardsdatascience.com\/virtualization-containers-for-data-science-newbies\/\" rel=\"noreferrer noopener\" target=\"_blank\">Virtualization &amp; Containers for Data Science Newbiews<\/a>\u2019. But the most important thing is that VMs simulate complete computers and have their own operating system with their own kernel on a hypervisor. This means that they require more resources, but also offer greater isolation.<\/p>\n<p class=\"wp-block-paragraph\" id=\"b8ca\">Both containers and VMs are virtualization technologies.<\/p>\n<p class=\"wp-block-paragraph\" id=\"b0a3\">Both make it possible to run applications in an isolated environment.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2291\">But in the two descriptions, you can also see the 3 most important differences:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Architecture: While each VM has its own operating system (OS) and runs on a hypervisor, containers share the kernel of the host operating system. However, containers still run in isolation from each other. A hypervisor is the software or firmware layer that manages VMs and abstracts the operating system of the VMs from the physical hardware. This makes it possible to run multiple VMs on a single physical server.<\/li>\n<li class=\"wp-block-list-item\">Resource consumption: As each VM contains a complete OS, it requires a lot of memory and CPU. Containers, on the other hand, are more lightweight because they share the host OS.<\/li>\n<li class=\"wp-block-list-item\">Portability: You have to customize a VM for different environments because it requires its own operating system with specific drivers and configurations that depend on the underlying hardware. A container, on the other hand, can be created once and runs anywhere a container runtime is available (Linux, Windows, cloud, on-premise). Container runtime is the software that creates, starts and manages containers \u2014 the best-known example is Docker.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"c7d3cf\" data-has-transparency=\"true\" style=\"--dominant-color: #c7d3cf;\" fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"673\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V3oynmoJpAZNgw7lIViyrg-1024x673.webp?resize=1024%2C673&#038;ssl=1\" alt=\"\" class=\"wp-image-598161 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V3oynmoJpAZNgw7lIViyrg-1024x673.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V3oynmoJpAZNgw7lIViyrg-300x197.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V3oynmoJpAZNgw7lIViyrg-768x505.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V3oynmoJpAZNgw7lIViyrg.webp 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Created by the author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"7e51\">You can experiment faster with Docker \u2014 whether you\u2019re testing a new ML model or setting up a data pipeline. You can package everything in a container and run it immediately. And you don\u2019t have any \u201cIt works on my machine\u201d-problems. Your container runs the same everywhere \u2014 so you can simply share it.<\/p>\n<h2 class=\"wp-block-heading\" id=\"2.0\">2 \u2014 Containers &amp; Data Science: Do I really need Containers? And 4 reasons why the answer is yes.<\/h2>\n<p class=\"wp-block-paragraph\" id=\"fe06\">As a data scientist, your main task is to analyze, process and model data to gain valuable insights and predictions, which in turn are important for management.<\/p>\n<p class=\"wp-block-paragraph\" id=\"fac6\">Of course, you don\u2019t need to have the same in-depth knowledge of containers, Docker or Kubernetes as a DevOps Engineer or a Site Reliability Engineer (SRE). Nevertheless, it is worth having container knowledge at a basic level\u200a\u2014\u200abecause these are 4 examples of where you will come into contact with it sooner or later:<\/p>\n<h3 class=\"wp-block-heading\" id=\"7dda\">Model deployment<\/h3>\n<p class=\"wp-block-paragraph\" id=\"b404\">You are training a model. You not only want to use it locally but also make it available to others. To do this, you can pack it into a container and make it available via a REST API.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2a4b\">Let\u2019s look at a concrete example: Your trained model runs in a Docker container with FastAPI or Flask. The server receives the requests, processes the data and returns ML predictions in real-time.<\/p>\n<h3 class=\"wp-block-heading\" id=\"8af5\">Reproducibility and easier collaboration<\/h3>\n<p class=\"wp-block-paragraph\" id=\"74fe\">ML models and pipelines require specific libraries. For example, if you want to use a deep learning model like a Transformer, you need TensorFlow or PyTorch. If you want to train and evaluate classic machine learning models, you need Scikit-Learn, NumPy and Pandas. A Docker container now ensures that your code runs with exactly the same dependencies on every computer, server or in the cloud. You can also deploy a Jupyter Notebook environment as a container so that other people can access it and use exactly the same packages and settings.<\/p>\n<h3 class=\"wp-block-heading\" id=\"453f\">Cloud integration<\/h3>\n<p class=\"wp-block-paragraph\" id=\"8c68\">Containers include all packages, dependencies and configurations that an application requires. They therefore run uniformly on local computers, servers or cloud environments. This means you don\u2019t have to reconfigure the environment.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d0cb\">For example, you write a data pipeline script. This works locally for you. As soon as you deploy it as a container, you can be sure that it will run in exactly the same way on AWS, Azure, GCP or the IBM Cloud.<\/p>\n<h3 class=\"wp-block-heading\" id=\"8d2a\">Scaling with Kubernetes<\/h3>\n<p class=\"wp-block-paragraph\" id=\"933b\">Kubernetes helps you to orchestrate containers. But more on that below. If you now get a lot of requests for your ML model, you can scale it automatically with Kubernetes. This means that more instances of the container are started.<\/p>\n<h2 class=\"wp-block-heading\" id=\"3.0\">3 \u2014 First Practice, then Theory: Container creation even without much prior knowledge<\/h2>\n<p class=\"wp-block-paragraph\" id=\"f5dc\">Let\u2019s take a look at an example that anyone can run through with minimal time \u2014 even if you haven\u2019t heard much about Docker and containers. It took me 30 minutes.<\/p>\n<p class=\"wp-block-paragraph\" id=\"9f51\">We\u2019ll set up a Jupyter Notebook inside a Docker container, creating a portable, reproducible Data Science environment. Once it\u2019s up and running, we can easily share it with others and ensure that everyone works with the exact same setup.<\/p>\n<h3 class=\"wp-block-heading\" id=\"2747\">0 \u2014 Install Docker Dekstop and create a project directory<\/h3>\n<p class=\"wp-block-paragraph\" id=\"0070\">To be able to use containers, we need Docker Desktop. To do this, we\u00a0<a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\" rel=\"noreferrer noopener\" target=\"_blank\">download Docker Desktop from the official website<\/a>.<\/p>\n<p class=\"wp-block-paragraph\" id=\"c125\">Now we create a new folder for the project. You can do this directly in the desired folder. I do this via Terminal \u2014 on Windows with Windows + R and open CMD.<\/p>\n<p class=\"wp-block-paragraph\" id=\"7895\">We use the following command:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"202020\" data-has-transparency=\"false\" style=\"--dominant-color: #202020;\" decoding=\"async\" width=\"653\" height=\"171\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V1A9LcqYOANw6E0dARlFdA.webp?resize=653%2C171&#038;ssl=1\" alt=\"\" class=\"wp-image-598162 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V1A9LcqYOANw6E0dARlFdA.webp 653w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_V1A9LcqYOANw6E0dARlFdA-300x79.webp 300w\" sizes=\"(max-width: 653px) 100vw, 653px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\" id=\"f624\">1. Create a Dockerfile<\/h3>\n<p class=\"wp-block-paragraph\" id=\"9e03\">Now we open VS Code or another editor and create a new file with the name \u2018Dockerfile\u2019. We save this file without an extension in the same directory.\u00a0<strong>Why doesn\u2019t it need an extension?<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"d299\">We add the following code to this file:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\"># Use the official Jupyter notebook image with SciPy\nFROM jupyter\/scipy-notebook:latest  \n\n# Set the working directory inside the container\nWORKDIR \/home\/jovyan\/work  \n\n# Copy all local files into the container\nCOPY . .\n\n# Start Jupyter Notebook without token\nCMD [\"start-notebook.sh\", \"--NotebookApp.token=''\"]<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"5e63\">We have thus defined a container environment for Jupyter Notebook that is based on the official Jupyter SciPy Notebook image.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4d16\">First, we define with <code>FROM<\/code> on which base image the container is built. <code>jupyter\/scipy-notebook:latest<\/code> is a preconfigured Jupyter notebook image and contains libraries such as NumPy, SiPy, Matplotlib or Pandas. Alternatively, we could also use a different image here.<\/p>\n<p class=\"wp-block-paragraph\" id=\"c6e5\">With <code>WORKDIR<\/code> we set the working directory within the container. <code>\/home\/jovyan\/work<\/code> is the default path used by Jupyter. User <code>jovyan<\/code> is the default user in Jupyter Docker images. Another directory could also be selected \u2014 but this directory is best practice for Jupyter containers.<\/p>\n<p class=\"wp-block-paragraph\" id=\"f6a0\">With <code>COPY . .<\/code> we copy all files from the local directory \u2014 in this case the Dockerfile, which is located in the <code>jupyter-docker<\/code> directory \u2014 to the working directory <code>\/home\/jovyan\/work<\/code> in the container.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d258\">With <code>CMD [\u201cstart-notebook.sh\u201d, \u201c \u2014 NotebookApp.token=\u2018\u2019\u2019\u201d]<\/code> we specify the default start command for the container, specify the start script for Jupyter Notebook and define that the notebook is started without a token \u2014 this allows us to access it directly via the browser.<\/p>\n<h3 class=\"wp-block-heading\" id=\"fa99\">2. Create the Docker image<\/h3>\n<p class=\"wp-block-paragraph\" id=\"6c67\">Next, we will build the Docker image. Make sure you have the previously installed Docker desktop open. We now go back to the terminal and use the following command:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">cd jupyter-docker\ndocker build -t my-jupyter .<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"3458\">With <code>cd jupyter-docker<\/code> we navigate to the folder we created earlier. With <code>docker build<\/code> we create a Docker image from the Dockerfile. With <code>-t my-jupyter<\/code> we give the image a name. The dot means that the image will be built based on the current directory. What does that mean? Note the space between the image name and the dot.<\/p>\n<p class=\"wp-block-paragraph\" id=\"422f\">The Docker image is the template for the container. This image contains everything needed for the application such as the operating system base (e.g. Ubuntu, Python, Jupyter), dependencies such as Pandas, Numpy, Jupyter Notebook, the application code and the startup commands. When we \u201cbuild\u201d a Docker image, this means that Docker reads the Dockerfile and executes the steps that we have defined there. The container can then be started from this template (Docker image).<\/p>\n<p class=\"wp-block-paragraph\" id=\"4d53\">We can now watch the Docker image being built in the terminal.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"17222c\" data-has-transparency=\"false\" style=\"--dominant-color: #17222c;\" decoding=\"async\" width=\"1024\" height=\"535\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_eL3OSNKJTwRRI8pFkHyggw-1024x535.webp?resize=1024%2C535&#038;ssl=1\" alt=\"\" class=\"wp-image-598163 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_eL3OSNKJTwRRI8pFkHyggw-1024x535.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_eL3OSNKJTwRRI8pFkHyggw-300x157.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_eL3OSNKJTwRRI8pFkHyggw-768x401.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_eL3OSNKJTwRRI8pFkHyggw.webp 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"3704\">We use <code>docker images<\/code> to check whether the image exists. If the output <code>my-jupyter<\/code> appears, the creation was successful.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">docker images<\/code><\/pre>\n<p class=\"wp-block-paragraph\">If yes, we see the data for the created Docker image:<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"252525\" data-has-transparency=\"false\" style=\"--dominant-color: #252525;\" loading=\"lazy\" decoding=\"async\" width=\"761\" height=\"83\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kMTvvysX74TI108ccqevnQ.webp?resize=761%2C83&#038;ssl=1\" alt=\"\" class=\"wp-image-598164 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kMTvvysX74TI108ccqevnQ.webp 761w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kMTvvysX74TI108ccqevnQ-300x33.webp 300w\" sizes=\"auto, (max-width: 761px) 100vw, 761px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\" id=\"09f9\">3. Start Jupyter container<\/h3>\n<p class=\"wp-block-paragraph\" id=\"1e80\">Next, we want to start the container and use this command to do so:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">docker run -p 8888:8888 my-jupyter<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"c0ec\">We start a container with <code>docker run<\/code>. First, we enter the specific name of the container that we want to start. And with <code>-p 8888:8888<\/code> we connect the local port (8888) with the port in the container (8888). Jupyter runs on this port. I do not understand.<\/p>\n<p class=\"wp-block-paragraph\" id=\"11c4\">Alternatively, you can also perform this step in Docker desktop:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"ced8eb\" data-has-transparency=\"false\" style=\"--dominant-color: #ced8eb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"367\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kryRnbK_xYGev5b2zghazw-1024x367.webp?resize=1024%2C367&#038;ssl=1\" alt=\"\" class=\"wp-image-598165 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kryRnbK_xYGev5b2zghazw-1024x367.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kryRnbK_xYGev5b2zghazw-300x108.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kryRnbK_xYGev5b2zghazw-768x275.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_kryRnbK_xYGev5b2zghazw.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\" id=\"d86e\">4. Open Jupyter Notebook &amp; create a test notebook<\/h2>\n<p class=\"wp-block-paragraph\" id=\"44eb\">Now we open the URL [http:\/\/localhost:8888](http:\/\/localhost:8888\/)\u00a0in the browser. You should now see the Jupyter Notebook interface.<\/p>\n<p class=\"wp-block-paragraph\" id=\"170b\">Here we will now create a Python 3 notebook and insert the following Python code into it.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import numpy as np\nimport matplotlib.pyplot as plt\n\nx = np.linspace(0, 10, 100)\ny = np.sin(x)\n\nplt.plot(x, y)\nplt.title(\"Sine Wave\")\nplt.show()<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Running the code will display the sine curve:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f0f3f4\" data-has-transparency=\"false\" style=\"--dominant-color: #f0f3f4;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"512\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__sle74116Vo2ltBrIE24Yg-1024x512.webp?resize=1024%2C512&#038;ssl=1\" alt=\"\" class=\"wp-image-598166 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__sle74116Vo2ltBrIE24Yg-1024x512.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__sle74116Vo2ltBrIE24Yg-300x150.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__sle74116Vo2ltBrIE24Yg-768x384.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1__sle74116Vo2ltBrIE24Yg.webp 1270w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\" id=\"054a\">5. Terminate the container<\/h3>\n<p class=\"wp-block-paragraph\" id=\"aecc\">At the end, we end the container either with \u2018CTRL + C\u2019 in the terminal or in Docker Desktop.<\/p>\n<p class=\"wp-block-paragraph\" id=\"6052\">With <code>docker ps<\/code> we can check in the terminal whether containers are still running and with <code>docker ps -a<\/code> we can display the container that has just been terminated:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"191919\" data-has-transparency=\"false\" style=\"--dominant-color: #191919;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"111\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Oe8l1asejIIaPuQF1qGrLQ-1024x111.webp?resize=1024%2C111&#038;ssl=1\" alt=\"\" class=\"wp-image-598167 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Oe8l1asejIIaPuQF1qGrLQ-1024x111.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Oe8l1asejIIaPuQF1qGrLQ-300x33.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Oe8l1asejIIaPuQF1qGrLQ-768x83.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_Oe8l1asejIIaPuQF1qGrLQ.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Screenshot taken by the author<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\" id=\"f360\">6. Share your Docker image<\/h3>\n<p class=\"wp-block-paragraph\" id=\"7d86\">If you now want to upload your Docker image to a registry, you can do this with the following command. This will upload your image to Docker Hub (you need a Docker Hub account for this). You can also upload it to a private registry of AWS Elastic Container, Google Container, Azure Container or IBM Cloud Container.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\">docker login\n\ndocker tag my-jupyter your-dockerhub-name\/my-jupyter:latest\n\ndocker push dein-dockerhub-name\/mein-jupyter:latest<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"c0ab\">If you then open Docker Hub and go to your repositories in your profile, the image should be visible.<\/p>\n<p class=\"wp-block-paragraph\" id=\"6310\">This was a very simple example to get started with Docker. If you want to dive a little deeper, you can deploy a trained ML model with FastAPI via a container.<\/p>\n<h2 class=\"wp-block-heading\" id=\"4.0\">4 \u2014 Your 101 Cheatsheet: The most important Docker commands &amp; concepts at a glance<\/h2>\n<p class=\"wp-block-paragraph\" id=\"b428\">You can actually think of a container like a shipping container. Regardless of whether you load it onto a ship (local computer), a truck (cloud server) or a train (data center) \u2014 the content always remains the same.<\/p>\n<h3 class=\"wp-block-heading\" id=\"9a5d\">The most important Docker terms<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Container: Lightweight, isolated environment for applications that contains all dependencies.<\/li>\n<li class=\"wp-block-list-item\">Docker: The most popular container platform that allows you to create and manage containers.<\/li>\n<li class=\"wp-block-list-item\">Docker Image: A read-only template that contains code, dependencies and system libraries.<\/li>\n<li class=\"wp-block-list-item\">Dockerfile: Text file with commands to create a Docker image.<\/li>\n<li class=\"wp-block-list-item\">Kubernetes: Orchestration tool to manage many containers automatically.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"452f\">The basic concepts behind containers<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Isolation: Each container contains its own processes, libraries and dependencies<\/li>\n<li class=\"wp-block-list-item\">Portability: Containers run wherever a container runtime is installed.<\/li>\n<li class=\"wp-block-list-item\">Reproducibility: You can create a container once and it runs exactly the same everywhere.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\" id=\"6030\"><strong>The most basic Docker commands<\/strong><\/h3>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-json\">docker --version # Check if Docker is installed\ndocker ps # Show running containers\ndocker ps -a # Show all containers (including stopped ones)\ndocker images # List of all available images\ndocker info # Show system information about the Docker installation\n\ndocker run hello-world # Start a test container\ndocker run -d -p 8080:80 nginx # Start Nginx in the background (-d) with port forwarding\ndocker run -it ubuntu bash # Start interactive Ubuntu container with bash\n\ndocker pull ubuntu # Load an image from Docker Hub\ndocker build -t my-app . # Build an image from a Dockerfile\n<\/code><\/pre>\n<h2 class=\"wp-block-heading\" id=\"final\">Final Thoughts: Key takeaways as a data scientist<\/h2>\n<p class=\"wp-block-paragraph\" id=\"1145\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f449.png?ssl=1\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> With Containers you can solve the \u201cIt works on my machine\u201d problem. Containers ensure that ML models, data pipelines, and environments run identically everywhere, independent of OS or dependencies.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4ae9\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f449.png?ssl=1\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> Containers are more lightweight and flexible than virtual machines. While VMs come with their own operating system and consume more resources, containers share the host operating system and start faster.<\/p>\n<p class=\"wp-block-paragraph\" id=\"783b\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f449.png?ssl=1\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> There are three key steps when working with containers: Create a Dockerfile to define the environment, use docker build to create an image, and run it with docker run \u2014 optionally pushing it to a registry with docker push.<\/p>\n<p class=\"wp-block-paragraph\" id=\"aca4\">And then there\u2019s Kubernetes.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2d1e\">A term that comes up a lot in this context: An orchestration tool that automates container management, ensuring scalability, load balancing and fault recovery. This is particularly useful for microservices and cloud applications.<\/p>\n<p class=\"wp-block-paragraph\" id=\"316b\">Before Docker, VMs were the go-to solution (see more in \u2018<a href=\"https:\/\/towardsdatascience.com\/virtualization-containers-for-data-science-newbies\/\" rel=\"noreferrer noopener\" target=\"_blank\">Virtualization &amp; Containers for Data Science Newbiews<\/a>\u2019.) VMs offer strong isolation, but require more resources and start slower.<\/p>\n<p class=\"wp-block-paragraph\" id=\"c5bf\">So, Docker was developed in 2013 by Solomon Hykes to solve this problem. Instead of virtualizing entire operating systems, containers run independently of the environment \u2014 whether on your laptop, a server or in the cloud. They contain all the necessary dependencies so that they work consistently everywhere.<\/p>\n<p class=\"wp-block-paragraph\" id=\"6614\">I simplify tech for curious minds<img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f680.png?ssl=1\" alt=\"\ud83d\ude80\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> If you enjoy my tech insights on Python, data science, <a href=\"https:\/\/towardsdatascience.com\/tag\/data-engineering\/\" title=\"Data Engineering\">Data Engineering<\/a>, machine learning and AI, consider subscribing to my\u00a0<a href=\"https:\/\/sarahleaschrch.substack.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">substack<\/a>.<\/p>\n<h2 class=\"wp-block-heading\" id=\"where\">Where Can You Continue Learning?<\/h2>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/virtualization-containers-for-data-science-newbies\/\" target=\"_blank\" rel=\"noreferrer noopener\">Towards Data Science \u2014 Virtualization &amp; Containers for Data Science Newbies<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/docs.docker.com\/get-started\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker Docs \u2014 Get started<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/kubernetes.io\/docs\/tutorials\/kubernetes-basics\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kubernetes \u2014 Learn Kubernetes Basics<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.freecodecamp.org\/news\/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b\/\" target=\"_blank\" rel=\"noreferrer noopener\">FreeCodeCamp \u2014 Container, VMs &amp; Docker<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.datacamp.com\/blog\/learn-docker\" target=\"_blank\" rel=\"noreferrer noopener\">DataCamp Blog \u2014 How to learn Docker from Scratch<\/a><\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/www.datacamp.com\/courses\/containerization-and-virtualization-concepts\" target=\"_blank\" rel=\"noreferrer noopener\">DataCamp \u2014 Course Containerization and Virtualization<\/a>\u00a0(first part is free \u2014 no affiliate link)<\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.ibm.com\/think\/topics\/containers\" target=\"_blank\" rel=\"noreferrer noopener\">IBM Blog and videos \u2014 Wha are containers?<\/a><\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge\/\">Why Data Scientists Should Care about Containers \u2014 and Stand Out with This Knowledge<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Sarah Lea<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Why Data Scientists Should Care about Containers \u2014 and Stand Out with This Knowledge \u201cI train models, analyze data and create dashboards \u2014 why should I care about Containers?\u201d Many people who are new to the world of data science ask themselves this question. But imagine you have trained a model that runs perfectly on [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,1081,401,83,1082,311,160],"tags":[1084,84,314],"class_list":["post-1955","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-containers","category-data-engineering","category-data-science","category-docker","category-getting-started","category-programming","tag-containers","tag-data","tag-why"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1955"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1955"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1955\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1955"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1955"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}