Concurrent and Parallel Programming in Python with asyncio, threading, and multiprocessing¶
This page is mostly based on a short presentation with my friend Caspar. You can get the slides here.
Basics: Concurrency vs. Parallelism¶
When you read about concurrency, multithreading, or parallel programming, there will be two terms that you have to understand: Concurrency and Parallelism.
Concurrency¶
A concurrent execution is not guaranteed to be parallel. Concurrency only means that you have two or more threads (i.e. a sequence of operations) where the operations of the threads can be interleaved. Interleaved means that the two operation sequences are combined into a single sequence in any possible combination.
For example, you have thread A with operations [A1, A2, A3] and thread B with operations [B1, B2, B3]. In a concurrent execution with a single execution unit (i.e. CPU core), you might run A1 and A2, then B1, then A3 and finish with B2 and B3. However, A1 and B1 (and every other combination of operations) will never be executed at the same time, only one after another.
The main benefit of concurrency is that multiple threads can be executed by a bit in a given time; you don't have to wait for a thread to finish in order to progress a second thread. For example, you can react to user input in the GUI while doing a lenghty background calculation. This means that your application can be more responsive to user inputs. However, you might not have a large performance increase for CPU-intensive tasks, because you still have a single executor (i.e. CPU core) for your threads. If you need more performance, you have to use parallelism.
Parallelism¶
When you have a parallel execution, then it will be concurrent as well. The main difference is that two or more operations can be executed at the exact same time, for example on two CPU cores. This has large benefits for the performance of your program: it finishes more quickly, or it has more throughput in the same time.
Examples of Concurrent and/or Parallel Software¶
- When your browser runs JavaScript from a website, it's executed concurrently on a single executor (with an event loop). However, internal components of the JavaScript engine like IO are parallel. If you need to do more CPU processing in your own code, you can use WebWorkers that don't block the main event loop.
- The scientific computing package Numpy for Python can do calculations in parallel.
Basics: Python Implementations¶
Python is an interpreted language. There are multiple interpreters available for Python, like:
- CPython ("official" reference implementation)
- MicroPython (for microcontrollers)
- Jython (Java implementation on JVM)
- Stackless Python
If you have a standard Python installation, you most likely have CPython installed. It's important to know that CPython compiles user scripts to bytecode before executing it.
A major difference between implementations is whether they contain the Global Interpreter Lock (GIL) or not.
The Global Interpreter Lock (GIL)¶
In CPython, the GIL ensures that only one Python thread can run bytecode at the same time. This ensures exclusive access to interpreter internals for the current thread, because accessing the internal data structures is not thread safe.
In the following diagram, you can see two Python threads. Thread 1 takes the GIL first and blocks Thread 2 until:
- A timeout of 5ms is reached
- Thread 1 does a syscall (like blocking IO/Network operations or calling
time.sleep()
) - Thread 1 calls a special library function from NumPy, SciPy, zlib, ...
- Note: Some functions of these libraries are implemented in C in such a way that they don't require the GIL while doing CPU-intensive work.
flowchart LR
classDef PythonInterpreter stroke:green,stroke-width:2px,stroke-dasharray: 5 5
subgraph Int1[Python Interpreter]
direction BT
GIL("🔒 GIL")
T1(Thread 1) -- 1st access --> GIL
T2(Thread 2) -- 2nd access ---x GIL
linkStyle 1 stroke:red,stroke-dasharray: 3 3;
end
class Int1 PythonInterpreter
In most cases, this is not a problem for performance, as blocking waits for IO completion have more impact. However, if you implement a CPU-intensive task purely in Python (e.g. image processing or calculations without external libraries), you might run into a bottleneck.
Showcase of Libraries¶
Threading (thread based)¶
The threading
library is a simple library to create threads that run concurrently. These threads are kernel level threads, not user level threads. As explained above, you have limited parallelism due to the GIL.
flowchart LR
classDef PythonInterpreter stroke:green,stroke-width:2px,stroke-dasharray: 5 5
subgraph Int1[Python Interpreter]
direction BT
GIL("🔒 GIL")
T1(Thread 1) -- 1st --> GIL
T2(Thread 2) -- 2nd --> GIL
end
class Int1 PythonInterpreter
Using the threading
library is straightforward, as you can see in the following example:
from threading import Thread
import time
def my_func(line: str):
time.sleep(5)
print(f"Output: {line}")
t1 = Thread(target=my_func, args=("test",))
t2 = Thread(target=my_func, args=("test2",))
t1.start()
t2.start()
# ... do something else
# Wait until Thread 1 and 2 are finished
t1.join()
t2.join()
You can inherit from Thread
as well if you want to do it object oriented:
# Alternative: Create Subclass of Thread
class MyThread(thread):
def run():
time.sleep(5)
print("done")
# ...
asyncio (coroutine based)¶
The asyncio
library has a different paradigm by using Coroutines and an Event Loop. Additionally, it uses the new syntax keywords def async
and await
. Use cases for this are lightweight IO tasks, like handling HTTP requests in a web server.
flowchart LR
classDef PythonInterpreter stroke:green,stroke-width:2px,stroke-dasharray: 5 5
subgraph Int1[Python Interpreter]
direction BT
subgraph T[Thread 1]
direction LR
EL("🔁<br/>Event Loop") -- get new task --> TQ("🗄️ <br/>Task Queue")
direction TB
EL -- run task asynchronously --> EL
end
T --> GIL("🔒GIL")
end
class Int1 PythonInterpreter
When you use asyncio
, you have to consider the two following "contexts":
- The normal context: This context is what you're used to while programming Python. You can call normal functions declared with
def
, where the function call blocks until the function returns. However, you can't callasync def
functions directly. - The async context: This is your context inside of an
async def
function. You can call normaldef
functions as usual, but now you can call otherasync def
functions as well. Theseasync def
functions are then executed asynchronously, and when you need their result, you can wait for them withawait
.
When you declare a function with async def
, it is considered a native coroutine function. When you call this function, it returns a coroutine object. However, the coroutine won't run automatically. There are three ways to run a coroutine object:
- Call
asyncio.run(coroutine_object)
from a normal context - Use
await awaitable_object
from an async context (an awaitable object can be a coroutine object or a task) - Create a task with
asyncio.create_task(coroutine_object)
from an async context
The following example shows how you can run coroutine objects by creating a task. By creating a task, you can guarantee that your coroutine will run sometime during the lifetime of your program. However, you have to store a reference to your task somewhere to prevent the garbage collector from freeing the task before it can be executed (see also The Heisenbug lurking in your async code).
import asyncio
TASK_LIST = []
async def calc_coro():
print("calculating...")
await asyncio.sleep(2) # some asynchronous operation
print("calc done")
return "foo"
async def main():
coroutine_object = calc_coro() # 2. call coroutine function to get coroutine object
task = asyncio.create_task(calc_coro()) # 3. run obtained coroutine object with task
TASK_LIST.append(task)
print("do other stuff")
await task # can be skipped if completion / result of task is not important
# If called without await: throws InvalidStateError result is not set
calcresult = task.result()
print(f"Result of calculation: {calcresult}")
if __name__ == "__main__":
asyncio.run(main()) # 1. run coroutine from normal context
A more modern and simple alternative to storing Task references in a global list is using a TaskGroup
. This group blocks until all Tasks that were created are finished.
import asyncio
async def mylog(line: str):
await asyncio.sleep(1)
print("Output: " + line)
async def main():
# New in Python 3.11
async with asyncio.TaskGroup() as tg:
task1 = tg.create_task(mylog("coro1"))
task2 = tg.create_task(mylog("coro2"))
print("all tasks completed")
if __name__ == "__main__":
asyncio.run(main())
Multiprocessing (process based)¶
The previous two libraries aren't suited for CPU-intensive tasks, as they are limited by the GIL. However, what options do you have if you simply need more performance for your Python program? For this, you can use the multithreading
library. Instead of creating new threads, this library creates new processes running their own Python interpreter, thus bypassing the GIL by using one for every process. This means that your code can actually run in parallel, instead of only concurrent.
flowchart LR
classDef PythonInterpreter stroke:green,stroke-width:2px,stroke-dasharray: 5 5
subgraph MP[Multiprocessing]
subgraph Int1[Python Interpreter]
direction BT
T1(Thread 1) --> GIL1("🔒GIL")
end
subgraph Int2[Python Interpreter]
direction BT
T2(Thread 2) --> GIL2("🔒GIL")
end
class Int1,Int2 PythonInterpreter
end
Similar to the threading
library, you can pass a target function that the new process should run:
from multiprocessing import Process, Queue
def my_process():
print("this is a second python interpreter")
if __name__ == "__main__":
p = Process(target=my_process)
p.start()
print("this is the first python interpreter")
p.join()
However, as they are now two separate processes, you can't access the same memory (i.e. variables) anymore. If the processes have to communicate with each other, you can use a Queue:
from multiprocessing import Process, Queue
def my_process(q):
# sends data through the queue
q.put(["python", "is", "cool"])
if __name__ == "__main__":
q = Queue()
# create a new process -> separate Python interpreter
p = Process(target=my_process, args=(q,))
p.start()
print(q.get())
p.join()
Alternatively, you can use a Pipe. The main difference between a Queue and a Pipe is that1:
- A Pipe can only have two endpoints (and thus has better performance).
- A Queue can have multiple producers and consumers.
from multiprocessing import Process, Pipe
def my_process(pipe):
pipe.send(["python", "is", "cool"])
pipe.close()
if __name__ == "__main__":
parent_pipe, child_pipe = Pipe()
p = Process(target=)
References¶
- Reitz, K., & Schlusser, T. (2016). The hitchhiker’s guide to python. O’Reilly Media.
- Beazley, D. (2010). Understanding the Python GIL. Link
- Ramalho, L. (2022). Fluent Python: Clear, Concise, and Effective Programming (2nd ed.). O’Reilly Media.
- Python Docs: bytecode
- Python Docs: threading
- Python Docs: asyncio
- Python Docs: multiprocessing
- Python Docs: concurrent.futures.html