{"id":860,"date":"2024-12-29T07:02:28","date_gmt":"2024-12-29T07:02:28","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/29\/deep-dive-into-multithreading-multiprocessing-and-asyncio-94fdbe0c91f0\/"},"modified":"2024-12-29T07:02:28","modified_gmt":"2024-12-29T07:02:28","slug":"deep-dive-into-multithreading-multiprocessing-and-asyncio-94fdbe0c91f0","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/29\/deep-dive-into-multithreading-multiprocessing-and-asyncio-94fdbe0c91f0\/","title":{"rendered":"Deep Dive into Multithreading, Multiprocessing, and Asyncio"},"content":{"rendered":"<p>    Deep Dive into Multithreading, Multiprocessing, and Asyncio<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>How to choose the right concurrency model<\/h4>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AhwsMbtR0kWWcej8AYWOwYA.jpeg?ssl=1\"><figcaption>Image by <a href=\"https:\/\/unsplash.com\/@pinjasaur\">Paul Esch-Laurent<\/a> from\u00a0<a href=\"https:\/\/unsplash.com\/\">Unsplash<\/a><\/figcaption><\/figure>\n<p>Python provides three main approaches to handle multiple tasks simultaneously: multithreading, multiprocessing, and\u00a0asyncio.<\/p>\n<p>Choosing the right model is crucial for maximising your program\u2019s performance and efficiently using system resources. (P.S. It is also a common interview question!)<\/p>\n<p>Without concurrency, a program processes only one task at a time. During operations like file loading, network requests, or user input, it stays idle, wasting valuable CPU cycles. Concurrency solves this by enabling multiple tasks to run efficiently.<\/p>\n<p>But which model should you use? Let\u2019s dive\u00a0in!<\/p>\n<h3>Contents<\/h3>\n<ol>\n<li>Fundamentals of concurrency<br \/>&#8211; Concurrency vs parallelism<br \/>&#8211; Programs<br \/>&#8211; Processes<br \/>&#8211; Threads<br \/>&#8211; How does the OS manage threads and processes?<\/li>\n<li>Python\u2019s concurrency models<br \/>&#8211; Multithreading<br \/>&#8211; Python\u2019s Global Interpreter Lock (GIL)<br \/>&#8211; Multiprocessing<br \/>&#8211;\u00a0Asyncio<\/li>\n<li>When should I use which concurrency model?<\/li>\n<\/ol>\n<h3>Fundamentals of concurrency<\/h3>\n<p>Before jumping into Python\u2019s concurrency models, let\u2019s recap some foundational concepts.<\/p>\n<h4>1. Concurrency vs Parallelism<\/h4>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5zUHBzNkzOZYPu1uvuLcSQ.png?ssl=1\"><figcaption>Visual representation of concurrency vs parallelism (drawn by\u00a0me)<\/figcaption><\/figure>\n<p>Concurrency is all about managing multiple tasks at the same time, not necessarily simultaneously. Tasks may take turns, creating the illusion of multitasking.<\/p>\n<p>Parallelism is about running multiple tasks simultaneously, typically by leveraging multiple CPU\u00a0cores.<\/p>\n<h4>2. Programs<\/h4>\n<p>Now let\u2019s move on to some fundamental OS concepts\u200a\u2014\u200aprograms, processes and\u00a0threads.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A7oO4wA3pvsqXgx6Gwc7twg.png?ssl=1\"><figcaption>Multiple threads can exist simultaneously within the a single process\u200a\u2014\u200aknown as multithreading (drawn by\u00a0me)<\/figcaption><\/figure>\n<blockquote><p>A program is simply a static file, like a Python script or an executable.<\/p><\/blockquote>\n<p>A program sits on disk, and is passive until the operating system (OS) loads it into memory to run. Once this happens, the program becomes a\u00a0<strong>process<\/strong>.<\/p>\n<h4>3. Processes<\/h4>\n<blockquote><p>A process is an independent instance of a running\u00a0program.<\/p><\/blockquote>\n<p>A process has its own memory space, resources, and execution state. Processes are isolated from each other, meaning one process cannot interfere with another unless explicitly designed to do so via mechanisms like <strong><em>inter-process communication (IPC)<\/em><\/strong>.<\/p>\n<p>Processes can generally be categorised into two\u00a0types:<\/p>\n<ol>\n<li>\n<strong>I\/O-bound processes:<\/strong><br \/>Spend most of it\u2019s time <strong><em>waiting<\/em><\/strong> for input\/output operations to complete, such as file access, network communication, or user input. While waiting, the CPU sits\u00a0idle.<\/li>\n<li>\n<strong>CPU-bound processes:<\/strong><br \/>Spend most of their time <strong><em>doing computations<\/em><\/strong> (e.g video encoding, numerical analysis). These tasks require a lot of CPU\u00a0time.<\/li>\n<\/ol>\n<p><strong>Lifecycle of a\u00a0process:<\/strong><\/p>\n<ul>\n<li>A process starts in a <em>new<\/em> state when\u00a0created.<\/li>\n<li>It moves to the <em>ready<\/em> state, waiting for CPU\u00a0time.<\/li>\n<li>If the process waits for an event like I\/O, it enters the <em>waiting<\/em>\u00a0state.<\/li>\n<li>Finally, it <em>terminates<\/em> after completing its\u00a0task.<\/li>\n<\/ul>\n<h4>4. Threads<\/h4>\n<blockquote><p>A thread is the smallest unit of execution within a process.<br \/>A process acts as a \u201ccontainer\u201d for threads, and multiple threads can be created and destroyed over the process\u2019s lifetime.<\/p><\/blockquote>\n<p>Every process has at least one thread\u200a\u2014\u200athe <strong><em>main thread<\/em><\/strong><em>\u200a<\/em>\u2014\u200abut it can also create additional threads.<\/p>\n<p>Threads share memory and resources within the same process, enabling efficient communication. However, this sharing can lead to synchronisation issues like race conditions or deadlocks if not managed carefully. Unlike processes, multiple threads in a single process are not isolated\u200a\u2014\u200aone misbehaving thread can crash the entire\u00a0process.<\/p>\n<h4>5. How does the OS manage threads and processes?<\/h4>\n<p>The CPU can execute <strong><em>only one task per core at a time<\/em><\/strong>. To handle multiple tasks, the operating system uses <strong><em>preemptive context switching<\/em><\/strong>.<\/p>\n<p>During a context switch, the OS pauses the current task, saves its state and loads the state of the next task to be executed.<\/p>\n<p>This rapid switching creates the illusion of simultaneous execution on a single CPU\u00a0core.<\/p>\n<p>For processes, context switching is more resource-intensive because the OS must save and load separate memory spaces. For threads, switching is faster because threads share the same memory within a process. However, frequent switching introduces overhead, which can slow down performance.<\/p>\n<p>True parallel execution of processes can only occur if there are multiple CPU cores available. Each core handles a separate process simultaneously.<\/p>\n<h3>Python\u2019s concurrency models<\/h3>\n<p>Let\u2019s now explore Python\u2019s specific concurrency models.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AYrsrPOh5y4EL_7UOFJ08yg.png?ssl=1\"><figcaption>Summary of the different concurrency models (drawn by\u00a0me)<\/figcaption><\/figure>\n<h3>1. Multithreading<\/h3>\n<p>Multithreading allows a process to execute multiple threads concurrently, with threads sharing the same memory and resources (see diagrams 2 and\u00a04).<\/p>\n<p>However, Python\u2019s Global Interpreter Lock (GIL) limits multithreading\u2019s effectiveness for CPU-bound tasks.<\/p>\n<h4>Python\u2019s Global Interpreter Lock\u00a0(GIL)<\/h4>\n<blockquote><p>The GIL is a lock that allows only one thread to hold control of the Python interpreter at any time, meaning only one thread can execute Python bytecode at\u00a0once.<\/p><\/blockquote>\n<p>The GIL was introduced to simplify memory management in Python as many internal operations, such as object creation, are not thread safe by default. Without a GIL, multiple threads trying to access the shared resources will require complex locks or synchronisation mechanisms to prevent race conditions and data corruption.<\/p>\n<p><strong><em>When is GIL a bottleneck?<\/em><\/strong><\/p>\n<ul>\n<li>For single threaded programs, the GIL is irrelevant as the thread has exclusive access to the Python interpreter.<\/li>\n<li>For multithreaded I\/O-bound programs, the GIL is less problematic as threads release the GIL when waiting for I\/O operations.<\/li>\n<li>For multithreaded CPU-bound operations, the GIL becomes a significant bottleneck. Multiple threads competing for the GIL must take turns executing Python bytecode.<\/li>\n<\/ul>\n<p>An interesting case worth noting is the use of time.sleep, which Python effectively treats as an I\/O operation. The time.sleep function is not CPU-bound because it does not involve active computation or the execution of Python bytecode during the sleep period. Instead, the responsibility of tracking the elapsed time is delegated to the OS. During this time, the thread releases the GIL, allowing other threads to run and utilise the interpreter.<\/p>\n<h3>2. Multiprocessing<\/h3>\n<p>Multiprocessing enables a system to run multiple processes in parallel, each with its own memory, GIL and resources. Within each process, there may be one or more threads (see diagrams 3 and\u00a04).<\/p>\n<p>Multiprocessing bypasses the limitations of the GIL. This makes it suitable for CPU bound tasks that require heavy computation.<\/p>\n<p>However, multiprocessing is more resource intensive due to separate memory and process overheads.<\/p>\n<h3>3. Asyncio<\/h3>\n<blockquote><p>Unlike threads or processes, asyncio uses a single thread to handle multiple\u00a0tasks.<\/p><\/blockquote>\n<p>When writing asynchronous code with the asyncio library, you&#8217;ll use the async\/await keywords to manage\u00a0tasks.<\/p>\n<h4><strong><em>Key concepts<\/em><\/strong><\/h4>\n<ol>\n<li>\n<strong>Coroutines:<\/strong> These are functions defined with async def\u00a0. They are the core of asyncio and represent tasks that can be paused and resumed\u00a0later.<\/li>\n<li>\n<strong>Event loop:<\/strong> It manages the execution of\u00a0tasks.<\/li>\n<li>\n<strong>Tasks:<\/strong> Wrappers around coroutines. When you want a coroutine to actually start running, you turn it into a task\u200a\u2014\u200aeg. using asyncio.create_task()<\/li>\n<li>\n<strong>await<\/strong>\u00a0: Pauses execution of a coroutine, giving control back to the event\u00a0loop.<\/li>\n<\/ol>\n<h4><strong><em>How it\u00a0works<\/em><\/strong><\/h4>\n<p>Asyncio runs an event loop that schedules tasks. Tasks voluntarily \u201cpause\u201d themselves when waiting for something, like a network response or a file read. While the task is paused, the event loop switches to another task, ensuring no time is wasted\u00a0waiting.<\/p>\n<p>This makes asyncio ideal for scenarios involving <strong>many small tasks that spend a lot of time waiting<\/strong>, such as handling thousands of web requests or managing database queries. Since everything runs on a single thread, asyncio avoids the overhead and complexity of thread switching.<\/p>\n<blockquote><p><strong>The key difference between asyncio and multithreading lies in how they handle waiting\u00a0tasks.<\/strong><\/p><\/blockquote>\n<ul>\n<li>Multithreading relies on the OS to switch between threads when one thread is waiting (<strong><em>preemptive context switching<\/em><\/strong>).<br \/>When a thread is waiting, the OS switches to another thread automatically.<\/li>\n<li>Asyncio uses a single thread and depends on tasks to \u201ccooperate\u201d by pausing when they need to wait (<strong><em>cooperative multitasking<\/em><\/strong>).<\/li>\n<\/ul>\n<h4>2 ways to write async\u00a0code:<\/h4>\n<p><strong>method 1: await coroutine<\/strong><\/p>\n<p>When you directly await a coroutine, the execution of the <strong><em>current coroutine pauses<\/em><\/strong> at the await statement until the awaited coroutine finishes. Tasks are executed <strong><em>sequentially <\/em><\/strong>within the current coroutine<strong><em>.<\/em><\/strong><\/p>\n<p>Use this approach when you need the result of the coroutine<em> <\/em><strong><em>immediately<\/em><\/strong> to proceed with the next\u00a0steps.<\/p>\n<p>Although this might sound like synchronous code, it\u2019s not. In synchronous code, the entire program would block during a\u00a0pause.<\/p>\n<blockquote><p>With asyncio, only the current coroutine pauses, while the rest of the program can continue running. This makes asyncio non-blocking at the program\u00a0level.<\/p><\/blockquote>\n<p><strong>Example:<\/strong><\/p>\n<p>The event loop pauses the current coroutine until fetch_data is complete.<\/p>\n<pre>async def fetch_data():<br>    print(\"Fetching data...\")<br>    await asyncio.sleep(1)  # Simulate a network call<br>    print(\"Data fetched\")<br>    return \"data\"<br><br>async def main():<br>    result = await fetch_data()  # Current coroutine pauses here<br>    print(f\"Result: {result}\")<br><br>asyncio.run(main())<\/pre>\n<p><strong>method 2: asyncio.create_task(coroutine)<\/strong><\/p>\n<p>The coroutine is scheduled to <strong><em>run concurrently in the background<\/em><\/strong>. Unlike await, the current coroutine continues executing immediately without waiting for the scheduled task to\u00a0finish.<\/p>\n<p><strong><em>The scheduled coroutine starts running as soon as the event loop finds an opportunity<\/em><\/strong>, without needing to wait for an explicit\u00a0await.<\/p>\n<blockquote><p>No new threads are created; instead, the coroutine runs within the same thread as the event loop, which manages when each task gets execution time.<\/p><\/blockquote>\n<p>This approach enables concurrency within the program, allowing multiple tasks to overlap their execution efficiently. You will later need to await the task to get it\u2019s result and ensure it\u2019s\u00a0done.<\/p>\n<p>Use this approach when you want to run tasks concurrently and don\u2019t need the results immediately.<\/p>\n<p><strong>Example:<\/strong><\/p>\n<p>When the line asyncio.create_task() is reached, the coroutine fetch_data() is scheduled to start running <strong><em>immediately when the event loop is available<\/em><\/strong>. This can happen even <strong><em>before<\/em><\/strong> you explicitly await the task. In contrast, in the first await method, the coroutine only starts executing when the await statement is\u00a0reached.<\/p>\n<p>Overall, this makes the program more efficient by overlapping the execution of multiple\u00a0tasks.<\/p>\n<pre>async def fetch_data():<br>    # Simulate a network call<br>    await asyncio.sleep(1)<br>    return \"data\"<br><br>async def main():<br>    # Schedule fetch_data<br>    task = asyncio.create_task(fetch_data())  <br>    # Simulate doing other work<br>    await asyncio.sleep(5)  <br>    # Now, await task to get the result<br>    result = await task  <br>    print(result)<br><br>asyncio.run(main())<\/pre>\n<h4>Other important points<\/h4>\n<ul>\n<li>\n<strong>You can mix synchronous and asynchronous code. <\/strong><br \/>Since synchronous code is blocking, it can be offloaded to a separate thread using asyncio.to_thread(). This makes your program effectively multithreaded.<br \/>In the example below, the asyncio event loop runs on the main thread, while a separate background thread is used to execute the sync_task.<\/li>\n<\/ul>\n<pre>import asyncio<br>import time<br><br>def sync_task():<br>    time.sleep(2)<br>    return \"Completed\"<br><br>async def main():<br>    result = await asyncio.to_thread(sync_task)<br>    print(result)<br><br>asyncio.run(main())<\/pre>\n<ul>\n<li>You should offload CPU-bound tasks which are computationally intensive to a separate\u00a0process.<\/li>\n<\/ul>\n<h3>When should I use which concurrency model?<\/h3>\n<p>This flow is a good way to decide when to use\u00a0what.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A7Das-hY02tbJbHPxCmMvXA.png?ssl=1\"><figcaption>Flowchart (drawn by me), referencing this <a href=\"https:\/\/stackoverflow.com\/questions\/27435284\/multiprocessing-vs-multithreading-vs-asyncio\/52498068#52498068\">stackoverflow<\/a> discussion<\/figcaption><\/figure>\n<ol>\n<li>\n<strong>Multiprocessing<\/strong><br \/>&#8211; Best for CPU-bound tasks which are computationally intensive.<br \/>&#8211; When you need to bypass the GIL\u200a\u2014\u200aEach process has it\u2019s own Python interpreter, allowing for true parallelism.<\/li>\n<li>\n<strong>Multithreading<\/strong><br \/>&#8211; Best for fast I\/O-bound tasks as the frequency of context switching is reduced and the Python interpreter sticks to a single thread for longer<br \/>&#8211; Not ideal for CPU-bound tasks due to\u00a0GIL.<\/li>\n<li>\n<strong>Asyncio<\/strong><br \/>&#8211; Ideal for slow I\/O-bound tasks such as long network requests or database queries because it efficiently handles waiting, making it scalable. <br \/>&#8211; Not suitable for CPU-bound tasks without offloading work to other processes.<\/li>\n<\/ol>\n<h3>Wrapping up<\/h3>\n<p>That\u2019s it folks. There\u2019s a lot more that this topic has to cover but I hope I\u2019ve introduced to you the various concepts, and when to use each\u00a0method.<\/p>\n<p>Thanks for reading! I write regularly on Python, software development and the projects I build, so give me a follow to not miss out. See you in the next article\u00a0\ud83d\ude42<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=94fdbe0c91f0\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/deep-dive-into-multithreading-multiprocessing-and-asyncio-94fdbe0c91f0\">Deep Dive into Multithreading, Multiprocessing, and Asyncio<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Clara Chong<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fdeep-dive-into-multithreading-multiprocessing-and-asyncio-94fdbe0c91f0\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep Dive into Multithreading, Multiprocessing, and Asyncio How to choose the right concurrency model Image by Paul Esch-Laurent from\u00a0Unsplash Python provides three main approaches to handle multiple tasks simultaneously: multithreading, multiprocessing, and\u00a0asyncio. Choosing the right model is crucial for maximising your program\u2019s performance and efficiently using system resources. (P.S. It is also a common interview [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,996,995,997,159,157],"tags":[998,1000,999],"class_list":["post-860","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-asynchronous","category-concurrency","category-multiprocessing","category-multithreading","category-python","tag-concurrency","tag-multiple","tag-processes"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/860"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=860"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/860\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}