Async, Threads, and Parallelism — Python for DevOps and Cloud | CertQnA

Python performance is rarely about Python being slow — it's about waiting on the network. This lesson covers the three concurrency models you'll actually use and which one to pick for which problem.

The Decision

Workload	Tool	Why
Many slow HTTP/DB/cloud calls	asyncio + httpx / aiobotocore	Single thread, no GIL contention, scales to thousands of connections
Same as above but only sync libraries available	ThreadPoolExecutor	Threads release the GIL while waiting on I/O
Heavy CPU work (image processing, hashing, ML)	ProcessPoolExecutor or numpy	Bypass the GIL with separate processes

Most DevOps work is I/O-bound — async is your default.

Threads for I/O-Bound Sync Code

from concurrent.futures import ThreadPoolExecutor, as_completed
import requests

urls = ["https://example.com/" + path for path in ["a", "b", "c", "d", "e"]]

def fetch(url: str) -> tuple[str, int]:
    resp = requests.get(url, timeout=10)
    return url, resp.status_code

with ThreadPoolExecutor(max_workers=10) as pool:
    futures = [pool.submit(fetch, url) for url in urls]
    for future in as_completed(futures):
        url, status = future.result()
        print(url, status)

Python's GIL (Global Interpreter Lock) prevents two threads from running Python bytecode at the same time, but threads do release the GIL while waiting on I/O — which is exactly when we want concurrency.

Processes for CPU-Bound Work

from concurrent.futures import ProcessPoolExecutor

def hash_file(path: str) -> str:
    import hashlib
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

paths = ["a.bin", "b.bin", "c.bin", "d.bin"]

with ProcessPoolExecutor() as pool:
    for path, digest in zip(paths, pool.map(hash_file, paths)):
        print(path, digest)

Each worker runs in its own Python process — true parallelism on multiple CPU cores. Cost: starting a process is heavier and you can only pass picklable arguments.

Note: Python 3.13 introduced an experimental free-threaded build that lets you turn the GIL off; the standard build still has it.

asyncio Basics

import asyncio

async def hello(name: str) -> str:
    await asyncio.sleep(1)
    return f"hello, {name}"

async def main():
    # Sequential — total ~3s
    a = await hello("a")
    b = await hello("b")
    c = await hello("c")

    # Concurrent — total ~1s
    results = await asyncio.gather(
        hello("a"),
        hello("b"),
        hello("c"),
    )
    print(results)

asyncio.run(main())

Key vocabulary:

Coroutine — a function defined with async def. Returns a coroutine object; doesn't run until awaited or scheduled.
await — pauses the coroutine until the awaited thing finishes.
Event loop — single-threaded scheduler that runs coroutines. asyncio.run(main()) starts one, runs main, closes it.
asyncio.gather — schedule many coroutines concurrently and wait for all.

Async HTTP with httpx

import asyncio
import httpx

async def fetch(client: httpx.AsyncClient, url: str) -> tuple[str, int]:
    resp = await client.get(url, timeout=10)
    return url, resp.status_code

async def main(urls: list[str]) -> None:
    async with httpx.AsyncClient(http2=True) as client:
        results = await asyncio.gather(*(fetch(client, u) for u in urls))
    for url, status in results:
        print(url, status)

asyncio.run(main(["https://a.example", "https://b.example", "https://c.example"]))

Hundreds of in-flight requests on one thread — orders of magnitude less overhead than threads.

Bounding Concurrency with a Semaphore

Unbounded gather on a list of 50 000 URLs will exhaust file descriptors and get you rate-limited. Bound it:

async def main(urls: list[str]) -> None:
    sem = asyncio.Semaphore(50)

    async def fetch_bounded(url):
        async with sem:
            async with httpx.AsyncClient(timeout=10) as client:
                return await client.get(url)

    results = await asyncio.gather(*(fetch_bounded(u) for u in urls))

Now at most 50 requests are in flight at any moment.

Mixing Sync and Async

If you call a blocking function inside async def, you stall the event loop and freeze every other coroutine. Two ways to avoid it:

import asyncio
import time

def sync_work():
    time.sleep(2)        # blocks
    return 42

async def main():
    # WRONG — blocks the loop for 2 seconds
    # result = sync_work()

    # RIGHT — run in a thread, await its completion
    result = await asyncio.to_thread(sync_work)
    print(result)

asyncio.run(main())

asyncio.to_thread (3.9+) hands work off to the default thread pool and returns an awaitable.

Cancellation and Timeouts

async def main():
    try:
        async with asyncio.timeout(5):
            await long_running_task()
    except asyncio.TimeoutError:
        print("took longer than 5 seconds")

asyncio.timeout (3.11+) cleanly cancels everything inside on timeout. Coroutines should be designed to handle cancellation politely — clean up resources in try/finally.

Common Pitfalls

Forgetting await. foo() returns a coroutine object that does nothing on its own. Linters and type checkers catch this.
Sharing one Session across the wrong loop. Create the client inside async def main(), not as a module-level global.
Calling blocking code unawares. Even requests.get or a CPU-heavy JSON parse stalls the loop. Profile or use asyncio.to_thread.
Catching Exception can swallow asyncio.CancelledError. Re-raise it, or use except asyncio.CancelledError: raise first.

When Not to Bother

If your script only makes one or two API calls, you don't need any of this. Concurrency adds complexity; introduce it when you measure a real wall-clock problem worth solving.