Async, Threading, and the GIL
Java's concurrency story is built on real OS threads with shared heap memory and a memory model (JMM) that governs visibility and ordering. You can saturate all CPU cores with ForkJoinPool, CompletableFuture, or virtual threads (Project Loom). Python's concurrency story is more constrained — and the constraint has a name: the Global Interpreter Lock (GIL). Understanding it is not optional; it shapes every architectural decision about parallelism in CPython.
The GIL in Plain Terms
The GIL is a mutex inside CPython that allows only one thread to execute Python bytecode at a time. It exists because CPython's memory management (reference counting) is not thread-safe, and the GIL is a pragmatic solution that has persisted since 1992. Python 3.13 introduced an experimental free-threaded build (PEP 703), but the GIL remains the default in production.
The consequence: CPU-bound work does not parallelise with threading. Two threads in a CPU loop run serially, not concurrently. This is the opposite of Java, where new Thread(runnable).start() genuinely uses two CPU cores.
Threading: Good for I/O, Useless for CPU
Python threads are real OS threads — they call pthread_create on Linux. The GIL is released during blocking I/O (network reads, file reads, time.sleep), so threads are perfectly useful for I/O-bound work.
import threading
import urllib.request
def fetch(url: str) -> None:
with urllib.request.urlopen(url) as resp:
data = resp.read()
print(f"Fetched {len(data)} bytes from {url}")
urls = [
"https://httpbin.org/get",
"https://httpbin.org/uuid",
"https://httpbin.org/headers",
]
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()Java equivalent:
ExecutorService pool = Executors.newFixedThreadPool(3);
List<Future<?>> futures = urls.stream()
.map(url -> pool.submit(() -> fetch(url)))
.toList();
futures.forEach(f -> { try { f.get(); } catch (Exception e) {} });
pool.shutdown();For CPU-bound parallelism, use multiprocessing — each process gets its own GIL and its own interpreter:
from multiprocessing import Pool
def cpu_work(n: int) -> int:
return sum(i * i for i in range(n))
with Pool(processes=4) as pool:
results = pool.map(cpu_work, [10_000_000] * 4)The cost: processes do not share memory. Passing data across the boundary uses serialisation (pickle by default), which is expensive for large objects. This is analogous to JVM process forking with IPC — not zero cost.
asyncio: Cooperative Multitasking
asyncio is Python's answer to non-blocking I/O — conceptually similar to Java's NIO Selector or Netty's event loop, but exposed as first-class language syntax with async/await.
import asyncio
import httpx
async def fetch(client: httpx.AsyncClient, url: str) -> int:
resp = await client.get(url)
return len(resp.content)
async def main() -> None:
urls = [
"https://httpbin.org/get",
"https://httpbin.org/uuid",
"https://httpbin.org/headers",
]
async with httpx.AsyncClient() as client:
tasks = [fetch(client, u) for u in urls]
sizes = await asyncio.gather(*tasks)
print(sizes)
asyncio.run(main())asyncio.gather is the equivalent of CompletableFuture.allOf — it fans out coroutines and waits for all of them. The difference from threads: there is only one OS thread; the event loop multiplexes coroutines cooperatively.
The critical rule: a synchronous blocking call inside a coroutine blocks the entire event loop. Calling time.sleep(1) inside async def is like calling Thread.sleep on Netty's I/O thread — it freezes everything. Use await asyncio.sleep(1) instead.
Choosing the Right Model
| Scenario | Python tool | Java analogue |
|---|---|---|
| I/O-bound, many connections | asyncio + httpx |
Netty / virtual threads (Loom) |
| I/O-bound, simple concurrency | threading |
ExecutorService threads |
| CPU-bound parallelism | multiprocessing |
ForkJoinPool / processes |
| CPU-bound + shared data | NumPy + C extensions | JNI / Panama native calls |
Async Pitfalls for JVM Engineers
Sync code in async context — Blocking calls inside coroutines freeze the event loop. Wrap blocking calls with asyncio.to_thread:
import asyncio
def blocking_db_call() -> str:
# imagine this does real blocking I/O
import time; time.sleep(0.5)
return "result"
async def main() -> None:
result = await asyncio.to_thread(blocking_db_call)
print(result)Missing await — Calling an async function without await returns a coroutine object, not a result. Python 3.11+ raises a warning; it is a silent bug in earlier versions.
async def get_data() -> str:
return "data"
async def main() -> None:
result = get_data() # BUG: returns coroutine, not "data"
result = await get_data() # correctThreadPoolExecutor for CPU work in async — asyncio.to_thread uses ThreadPoolExecutor under the hood; the GIL still limits true parallelism. For CPU-bound async work, use ProcessPoolExecutor:
from concurrent.futures import ProcessPoolExecutor
import asyncio
def heavy(n: int) -> int:
return sum(i**2 for i in range(n))
async def main() -> None:
loop = asyncio.get_event_loop()
with ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(pool, heavy, 10_000_000)
print(result)Key Takeaways
- The GIL means Python threads cannot parallelise CPU-bound work — use
multiprocessingfor that, notthreading. threadingis effective for I/O-bound work because the GIL is released during blocking I/O calls.asynciois cooperative multitasking on a single thread — similar to Netty or Java'sCompletableFuturechain, but exposed viaasync/awaitsyntax.- Never call blocking code inside a coroutine without wrapping it in
asyncio.to_threadorrun_in_executor. asyncio.gathermaps toCompletableFuture.allOf; individual coroutines map toCompletableFuturetasks.- Python 3.13's free-threaded mode (no GIL) is experimental — wait for broader ecosystem support before adopting it in production.