What is the difference between CPU-bound and I/O-bound work, and how does the choice affect concurrency strategy in Python?
CPU-bound work keeps the processor busy the whole time — matrix multiplication, compression, parsing. I/O-bound work spends most of its time waiting for a slow external resource — network, disk, database. The distinction directly determines which concurrency primitive to reach for: multiprocessing for CPU-bound (bypasses the GIL), threading or asyncio for I/O-bound (GIL released during waits).
How to think about it
Diagnosing the bottleneck first
Before picking a tool, profile. A high CPU percentage with low idle time means CPU-bound. A process spending most of its time sleeping, waiting on sockets or disk, means I/O-bound. Getting this diagnosis wrong is expensive — adding threads to a CPU-bound loop makes things slower, not faster (more GIL contention).
top # look at %CPU column
iostat 1 # disk I/O rate
In Python:
import cProfile
cProfile.run("my_function()", sort="cumulative")
The GIL is the key
Python’s Global Interpreter Lock means only one thread executes Python bytecode at a time. For CPU-bound work, threads cannot run in parallel — the GIL serialises them. For I/O-bound work, the GIL is released during blocking system calls (network read, disk write), so threads genuinely overlap their waiting.
I/O-bound: threading or asyncio
The GIL is released during blocking syscalls. Multiple threads can overlap their wait times even though they share one interpreter:
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch(url):
return requests.get(url).status_code
urls = ["https://httpbin.org/delay/1"] * 8
with ThreadPoolExecutor(max_workers=8) as pool:
codes = list(pool.map(fetch, urls))
# ~1 second total, not 8
For very high fan-out (thousands of concurrent requests), asyncio + an async HTTP library beats threads on memory and overhead because there are no thread stacks.
CPU-bound: multiprocessing
Each Process has its own Python interpreter and its own GIL, so all cores can work simultaneously:
from concurrent.futures import ProcessPoolExecutor
def heavy(n):
return sum(i ** 2 for i in range(n))
with ProcessPoolExecutor() as pool:
results = list(pool.map(heavy, [5_000_000] * 4))
# ~4x speedup on a 4-core machine
For numerical work, NumPy releases the GIL inside its C routines — so np.dot on large arrays is effectively CPU-parallel even from threads.
Decision table
| Workload | Tool | Why |
|---|---|---|
| Network / DB / file I/O | threading, asyncio | GIL released during waits |
| Numerical computation | multiprocessing, NumPy | True parallel; GIL bypassed |
| Mixed pipeline | ProcessPoolExecutor for heavy stages, threads within | Separate by bottleneck |
| Shell commands | subprocess | OS-managed, no GIL concern |