Python for the JVM Engineer

Async, Threading, and the GIL

Ravinder·November 15, 2025·5 min read

PythonJVMJavaasyncioGILthreadingconcurrency

Series

Python for the JVM Engineer

Part 3 of 10

← Part 2

Packaging and Dependency Hell

Part 4 →

Type Hints in Real Code

Java's concurrency story is built on real OS threads with shared heap memory and a memory model (JMM) that governs visibility and ordering. You can saturate all CPU cores with ForkJoinPool, CompletableFuture, or virtual threads (Project Loom). Python's concurrency story is more constrained — and the constraint has a name: the Global Interpreter Lock (GIL). Understanding it is not optional; it shapes every architectural decision about parallelism in CPython.

The GIL in Plain Terms

The GIL is a mutex inside CPython that allows only one thread to execute Python bytecode at a time. It exists because CPython's memory management (reference counting) is not thread-safe, and the GIL is a pragmatic solution that has persisted since 1992. Python 3.13 introduced an experimental free-threaded build (PEP 703), but the GIL remains the default in production.

sequenceDiagram participant T1 as Thread 1 participant GIL participant T2 as Thread 2 T1->>GIL: acquire GIL-->>T1: granted T1->>T1: execute bytecode T1->>GIL: release (every 5ms or on I/O) GIL-->>T2: granted T2->>T2: execute bytecode T2->>GIL: release

The consequence: CPU-bound work does not parallelise with threading. Two threads in a CPU loop run serially, not concurrently. This is the opposite of Java, where new Thread(runnable).start() genuinely uses two CPU cores.

Threading: Good for I/O, Useless for CPU

Python threads are real OS threads — they call pthread_create on Linux. The GIL is released during blocking I/O (network reads, file reads, time.sleep), so threads are perfectly useful for I/O-bound work.

import threading
import urllib.request
 
def fetch(url: str) -> None:
    with urllib.request.urlopen(url) as resp:
        data = resp.read()
    print(f"Fetched {len(data)} bytes from {url}")
 
urls = [
    "https://httpbin.org/get",
    "https://httpbin.org/uuid",
    "https://httpbin.org/headers",
]
 
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()

Java equivalent:

ExecutorService pool = Executors.newFixedThreadPool(3);
List<Future<?>> futures = urls.stream()
    .map(url -> pool.submit(() -> fetch(url)))
    .toList();
futures.forEach(f -> { try { f.get(); } catch (Exception e) {} });
pool.shutdown();

For CPU-bound parallelism, use multiprocessing — each process gets its own GIL and its own interpreter:

from multiprocessing import Pool
 
def cpu_work(n: int) -> int:
    return sum(i * i for i in range(n))
 
with Pool(processes=4) as pool:
    results = pool.map(cpu_work, [10_000_000] * 4)

The cost: processes do not share memory. Passing data across the boundary uses serialisation (pickle by default), which is expensive for large objects. This is analogous to JVM process forking with IPC — not zero cost.

asyncio: Cooperative Multitasking

asyncio is Python's answer to non-blocking I/O — conceptually similar to Java's NIO Selector or Netty's event loop, but exposed as first-class language syntax with async/await.

import asyncio
import httpx
 
async def fetch(client: httpx.AsyncClient, url: str) -> int:
    resp = await client.get(url)
    return len(resp.content)
 
async def main() -> None:
    urls = [
        "https://httpbin.org/get",
        "https://httpbin.org/uuid",
        "https://httpbin.org/headers",
    ]
    async with httpx.AsyncClient() as client:
        tasks = [fetch(client, u) for u in urls]
        sizes = await asyncio.gather(*tasks)
    print(sizes)
 
asyncio.run(main())

asyncio.gather is the equivalent of CompletableFuture.allOf — it fans out coroutines and waits for all of them. The difference from threads: there is only one OS thread; the event loop multiplexes coroutines cooperatively.

flowchart LR EL["Event Loop\n(single thread)"] --> C1["coroutine: fetch /get"] EL --> C2["coroutine: fetch /uuid"] EL --> C3["coroutine: fetch /headers"] C1 -- "await network I/O" --> EL C2 -- "await network I/O" --> EL C3 -- "await network I/O" --> EL

The critical rule: a synchronous blocking call inside a coroutine blocks the entire event loop. Calling time.sleep(1) inside async def is like calling Thread.sleep on Netty's I/O thread — it freezes everything. Use await asyncio.sleep(1) instead.

Choosing the Right Model

Scenario	Python tool	Java analogue
I/O-bound, many connections	`asyncio` + `httpx`	Netty / virtual threads (Loom)
I/O-bound, simple concurrency	`threading`	`ExecutorService` threads
CPU-bound parallelism	`multiprocessing`	`ForkJoinPool` / processes
CPU-bound + shared data	NumPy + C extensions	JNI / Panama native calls

Async Pitfalls for JVM Engineers

Sync code in async context — Blocking calls inside coroutines freeze the event loop. Wrap blocking calls with asyncio.to_thread:

import asyncio
 
def blocking_db_call() -> str:
    # imagine this does real blocking I/O
    import time; time.sleep(0.5)
    return "result"
 
async def main() -> None:
    result = await asyncio.to_thread(blocking_db_call)
    print(result)

Missing await — Calling an async function without await returns a coroutine object, not a result. Python 3.11+ raises a warning; it is a silent bug in earlier versions.

async def get_data() -> str:
    return "data"
 
async def main() -> None:
    result = get_data()   # BUG: returns coroutine, not "data"
    result = await get_data()   # correct

ThreadPoolExecutor for CPU work in async — asyncio.to_thread uses ThreadPoolExecutor under the hood; the GIL still limits true parallelism. For CPU-bound async work, use ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor
import asyncio
 
def heavy(n: int) -> int:
    return sum(i**2 for i in range(n))
 
async def main() -> None:
    loop = asyncio.get_event_loop()
    with ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(pool, heavy, 10_000_000)
    print(result)

Key Takeaways

The GIL means Python threads cannot parallelise CPU-bound work — use multiprocessing for that, not threading.
threading is effective for I/O-bound work because the GIL is released during blocking I/O calls.
asyncio is cooperative multitasking on a single thread — similar to Netty or Java's CompletableFuture chain, but exposed via async/await syntax.
Never call blocking code inside a coroutine without wrapping it in asyncio.to_thread or run_in_executor.
asyncio.gather maps to CompletableFuture.allOf; individual coroutines map to CompletableFuture tasks.
Python 3.13's free-threaded mode (no GIL) is experimental — wait for broader ecosystem support before adopting it in production.

Series

Python for the JVM Engineer

Part 3 of 10

← Part 2

Packaging and Dependency Hell

Part 4 →

Type Hints in Real Code