Python performance is rarely about Python being slow — it's about waiting on the network. This lesson covers the three concurrency models you'll actually use and which one to pick for which problem.
The Decision
| Workload | Tool | Why |
|---|---|---|
| Many slow HTTP/DB/cloud calls | asyncio + httpx / aiobotocore | Single thread, no GIL contention, scales to thousands of connections |
| Same as above but only sync libraries available | ThreadPoolExecutor | Threads release the GIL while waiting on I/O |
| Heavy CPU work (image processing, hashing, ML) | ProcessPoolExecutor or numpy | Bypass the GIL with separate processes |
Most DevOps work is I/O-bound — async is your default.
Threads for I/O-Bound Sync Code
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
urls = ["https://example.com/" + path for path in ["a", "b", "c", "d", "e"]]
def fetch(url: str) -> tuple[str, int]:
resp = requests.get(url, timeout=10)
return url, resp.status_code
with ThreadPoolExecutor(max_workers=10) as pool:
futures = [pool.submit(fetch, url) for url in urls]
for future in as_completed(futures):
url, status = future.result()
print(url, status)
Python's GIL (Global Interpreter Lock) prevents two threads from running Python bytecode at the same time, but threads do release the GIL while waiting on I/O — which is exactly when we want concurrency.
Processes for CPU-Bound Work
from concurrent.futures import ProcessPoolExecutor
def hash_file(path: str) -> str:
import hashlib
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
return h.hexdigest()
paths = ["a.bin", "b.bin", "c.bin", "d.bin"]
with ProcessPoolExecutor() as pool:
for path, digest in zip(paths, pool.map(hash_file, paths)):
print(path, digest)
Each worker runs in its own Python process — true parallelism on multiple CPU cores. Cost: starting a process is heavier and you can only pass picklable arguments.
Note: Python 3.13 introduced an experimental free-threaded build that lets you turn the GIL off; the standard build still has it.
asyncio Basics
import asyncio
async def hello(name: str) -> str:
await asyncio.sleep(1)
return f"hello, {name}"
async def main():
# Sequential — total ~3s
a = await hello("a")
b = await hello("b")
c = await hello("c")
# Concurrent — total ~1s
results = await asyncio.gather(
hello("a"),
hello("b"),
hello("c"),
)
print(results)
asyncio.run(main())
Key vocabulary:
- Coroutine — a function defined with
async def. Returns a coroutine object; doesn't run until awaited or scheduled. await— pauses the coroutine until the awaited thing finishes.- Event loop — single-threaded scheduler that runs coroutines.
asyncio.run(main())starts one, runsmain, closes it. asyncio.gather— schedule many coroutines concurrently and wait for all.
Async HTTP with httpx
import asyncio
import httpx
async def fetch(client: httpx.AsyncClient, url: str) -> tuple[str, int]:
resp = await client.get(url, timeout=10)
return url, resp.status_code
async def main(urls: list[str]) -> None:
async with httpx.AsyncClient(http2=True) as client:
results = await asyncio.gather(*(fetch(client, u) for u in urls))
for url, status in results:
print(url, status)
asyncio.run(main(["https://a.example", "https://b.example", "https://c.example"]))
Hundreds of in-flight requests on one thread — orders of magnitude less overhead than threads.
Bounding Concurrency with a Semaphore
Unbounded gather on a list of 50 000 URLs will exhaust file descriptors and get you rate-limited. Bound it:
async def main(urls: list[str]) -> None:
sem = asyncio.Semaphore(50)
async def fetch_bounded(url):
async with sem:
async with httpx.AsyncClient(timeout=10) as client:
return await client.get(url)
results = await asyncio.gather(*(fetch_bounded(u) for u in urls))
Now at most 50 requests are in flight at any moment.
Mixing Sync and Async
If you call a blocking function inside async def, you stall the event loop and freeze every other coroutine. Two ways to avoid it:
import asyncio
import time
def sync_work():
time.sleep(2) # blocks
return 42
async def main():
# WRONG — blocks the loop for 2 seconds
# result = sync_work()
# RIGHT — run in a thread, await its completion
result = await asyncio.to_thread(sync_work)
print(result)
asyncio.run(main())
asyncio.to_thread (3.9+) hands work off to the default thread pool and returns an awaitable.
Cancellation and Timeouts
async def main():
try:
async with asyncio.timeout(5):
await long_running_task()
except asyncio.TimeoutError:
print("took longer than 5 seconds")
asyncio.timeout (3.11+) cleanly cancels everything inside on timeout. Coroutines should be designed to handle cancellation politely — clean up resources in try/finally.
Common Pitfalls
- Forgetting
await.foo()returns a coroutine object that does nothing on its own. Linters and type checkers catch this. - Sharing one Session across the wrong loop. Create the client inside
async def main(), not as a module-level global. - Calling blocking code unawares. Even
requests.getor a CPU-heavy JSON parse stalls the loop. Profile or useasyncio.to_thread. - Catching
Exceptioncan swallowasyncio.CancelledError. Re-raise it, or useexcept asyncio.CancelledError: raisefirst.
When Not to Bother
If your script only makes one or two API calls, you don't need any of this. Concurrency adds complexity; introduce it when you measure a real wall-clock problem worth solving.