Lesson 23 • Advanced
Parallelism with concurrent.futures
Master Python's concurrent.futures module to build high-performance parallel systems using ThreadPoolExecutor and ProcessPoolExecutor.
What You'll Learn in This Lesson
- • How
concurrent.futuresabstracts threads and processes into a unified API - • When to use
ThreadPoolExecutorvsProcessPoolExecutor - • How Futures represent pending work — and how to collect results
- • Running parallel tasks with
map()andsubmit() - • Handling errors, cancellations, and timeouts in parallel jobs
- • Real patterns: parallel image processing, batch API calls, data pipelines
1. What concurrent.futures Actually Does
concurrent.futures like a task management app. You submit tasks (like "download this file" or "process this image"), and it assigns workers to complete them. You get a "ticket" (Future) that you can check later to see if your task is done.It provides two executor types:
| Executor | Workers Are | Best For | Examples |
|---|---|---|---|
| ThreadPoolExecutor | Threads | I/O-bound tasks (waiting) | API calls, file downloads, DB queries |
| ProcessPoolExecutor | Processes | CPU-bound tasks (thinking) | Image processing, ML prep, heavy math |
⚙️ 2. The Basic Pattern (Threads)
submit() your order, get a receipt (Future), and can do other things. When you call .result(), you wait at the counter until your coffee is ready.Basic ThreadPoolExecutor
Submit tasks to a thread pool and get results
from concurrent.futures import ThreadPoolExecutor
import time
def download(url):
# Simulate a network request that takes time
time.sleep(1)
return f"Downloaded {url}"
# Create a pool with 4 worker threads
# Think of it as hiring 4 assistants
executor = ThreadPoolExecutor(max_workers=4)
# Submit a task - returns immediately with a Future
# The task runs in the background
future = executor.submit(download, "https://example.com")
print("Task submitted - we can do other work here!")
...| Method | What It Does | Returns |
|---|---|---|
submit(fn, *args) | Schedules function to run in background | Future object |
future.result() | Waits for and returns the result | Function's return value |
shutdown() | Releases worker threads | None |
with statement (context manager) for automatic cleanup!⚡ 3. Running Many Tasks at Once
executor.map() is like a pizza shop with multiple ovens. You give them a list of pizzas to make, they cook them in parallel using all available ovens, and return them to you in the same order you requested.Parallel Map with Threads
Process multiple items in parallel using executor.map()
from concurrent.futures import ThreadPoolExecutor
def fetch(url):
# simulate work - in real code this would be an HTTP request
return f"Content from {url}"
urls = ["url1", "url2", "url3", "url4"]
# The 'with' statement ensures automatic cleanup
# No need to call executor.shutdown() manually!
with ThreadPoolExecutor(max_workers=4) as executor:
# map() is like the built-in map(), but parallel!
# It automatically:
# 1. Submits all tasks
# 2. Collects results in order
...| Approach | When to Use | Returns |
|---|---|---|
submit() | Different functions, custom handling | Individual Futures |
map() | Same function, many inputs | Iterator of results (in order) |
map() preserves order! Even if url4 finishes first, you'll still get results in the order [url1, url2, url3, url4].4. CPU Parallelism with ProcessPoolExecutor
CPU-Bound Processing
Use ProcessPoolExecutor for true multi-core parallelism
from concurrent.futures import ProcessPoolExecutor
def heavy_compute(n):
# CPU-intensive work: sum of squares
# This keeps the CPU busy (no waiting)
return sum(i * i for i in range(n))
numbers = [10_000_000, 10_000_000, 10_000_000, 10_000_000]
# ProcessPoolExecutor uses separate processes, not threads
# Each process gets its own Python interpreter and GIL
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(heavy_compute, numbers))
print(results)
#
...| Executor | Memory | GIL | CPU Usage |
|---|---|---|---|
| ThreadPoolExecutor | Shared | One GIL (blocks CPU work) | 1 core effective |
| ProcessPoolExecutor | Separate | Each has own GIL | All cores! |
🧩 5. Futures — Understanding the Object
A Future represents a pending operation. It can be in different states:
| State | Meaning | Check With |
|---|---|---|
| Running | Task is currently being executed | future.running() |
| Done | Task completed (success or error) | future.done() |
| Cancelled | Task was cancelled before running | future.cancelled() |
Working with Futures
Inspect future state and retrieve results
from concurrent.futures import ThreadPoolExecutor
import time
def task(n):
# Simulate work that takes 'n' seconds
time.sleep(n)
return n * 2
executor = ThreadPoolExecutor()
# Submit a task that takes 2 seconds
future = executor.submit(task, 2)
# Check state immediately (task is running)
print(f"Done? {future.done()}") # False
print(f"Running? {future.running()}") # True
# Wait for result (blocks until complete)
result = future.result()
print(f"Result: {result}") # 4
# No
...🌀 6. Handling Exceptions in Parallel Tasks
.result()), you'll see the problem and can handle it appropriately.If a function raises an exception inside a worker, .result() will re-raise it in the main thread:
Exception Handling in Futures
Handle errors from parallel tasks gracefully
from concurrent.futures import ThreadPoolExecutor
def failing_task():
# This exception happens in a worker thread
raise ValueError("Something went wrong")
executor = ThreadPoolExecutor()
future = executor.submit(failing_task)
# The exception is "stored" in the Future
# When we call .result(), it's re-raised here
try:
future.result()
except ValueError as e:
print(f"Caught: {e}")
# Output: Caught: Something went wrong| Method | On Success | On Error |
|---|---|---|
future.result() | Returns the value | Raises the exception |
future.exception() | Returns None | Returns the exception object |
.result()calls in try/except just like you would with any other code that might fail.7. Real-World Example — Parallel Web Requests
Parallel Web Requests
Fetch multiple URLs concurrently
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch_url(url):
response = requests.get(url)
return len(response.content)
urls = [
"https://api.github.com/users/github",
"https://api.github.com/users/microsoft",
"https://api.github.com/users/google"
]
with ThreadPoolExecutor(max_workers=10) as executor:
sizes = list(executor.map(fetch_url, urls))
print(f"Total bytes: {sum(sizes)}")| Approach | Time for 100 URLs (0.3s each) | Speedup |
|---|---|---|
| Sequential (one by one) | 30 seconds | 1x (baseline) |
| ThreadPool (10 workers) | ~3 seconds | ~10x faster! |
Useful for:
- scraping
- API pipelines
- ETL ingestion
max_workers based on your use case. For web requests, 10-50 workers is common. Too many might overwhelm the server or trigger rate limits!⚡ 8. Real-World Example — CPU Parallel Data Processing
Parallel Password Hashing
Use all CPU cores for heavy computation
from concurrent.futures import ProcessPoolExecutor
import hashlib
def hash_password(password):
return hashlib.pbkdf2_hmac('sha256',
password.encode(),
b'salt',
100000).hex()
passwords = ["pass123", "secret", "mypass"] * 1000
with ProcessPoolExecutor() as executor:
hashed = list(executor.map(hash_password, passwords))
print(f"Hashed {len(hashed)} passwords")This pattern powers:
- AI data prep
- big dataset cleaning
- batch computation services
🔄 9. Mixing Concurrency & Parallelism
For the best performance, systems combine:
- I/O parallelism (threads)
- CPU parallelism (processes)
- Task scheduling (asyncio)
| Pipeline Stage | Best Tool | Why |
|---|---|---|
| Download data | ThreadPoolExecutor | I/O-bound: waiting for servers |
| Parse/transform data | ProcessPoolExecutor | CPU-bound: heavy computation |
| Coordinate/schedule | asyncio | Lightweight: manage task flow |
For example:
- asyncio → orchestrates
- ThreadPoolExecutor → handles blocking file/network
- ProcessPoolExecutor → handles heavy CPU tasks
This is how modern Python backends (FastAPI, aiohttp) achieve massive throughput.
📦 10. Choosing the Right Executor
| Scenario | Best Choice |
|---|---|
| Many API calls | ThreadPoolExecutor |
| Downloading files | ThreadPoolExecutor |
| Reading thousands of files | ThreadPoolExecutor |
| Image processing | ProcessPoolExecutor |
| ML preprocessing | ProcessPoolExecutor |
| Large math loops | ProcessPoolExecutor |
| ETL pipelines | Both (mixed) |
If your task is waiting, use threads.
If your task is thinking, use processes.
🧩 11-20. Behind the Scenes & Advanced Patterns
Advanced topics covered:
11. How Executors Work
Task queue, worker threads/processes, IPC mechanisms
12. Serialization Overhead
Pickling costs, lambda limitations, using top-level functions
13. Batching Large Jobs
Grouping tasks for efficiency, reducing pickling overhead
14. chunksize Tuning
Optimizing executor.map() with chunksize parameter
15. Managing Shared State
multiprocessing.Manager, Queue, shared_memory
16. Thread Starvation
Preventing long tasks from blocking the pool
17. Avoiding Deadlocks
Never call .result() inside worker tasks
18. Async Execution
Running futures without blocking main thread
19. Thread Affinity
Pinning threads to CPU cores for consistent latency
20. Hybrid Pipeline Architecture
AsyncIO → ThreadPool → ProcessPool → AsyncIO pattern
🧪 21. Full Real-World Example — Data ETL Pipeline
Full ETL Pipeline
Combine asyncio, threads, and processes for a real-world pipeline
import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
io_pool = ThreadPoolExecutor(20)
cpu_pool = ProcessPoolExecutor(8)
def read_file(path):
with open(path) as f:
return f.read()
def transform(text):
return text.upper()
async def main(files):
loop = asyncio.get_event_loop()
# Stage 1: Read files (I/O bound)
contents = await asyncio.gather(*[
loop.run_in_executor(io_pool, read_file, f)
for f in files
])
# S
...This is TRUE professional pipeline architecture.
🧬 22. Zero-Copy Shared Memory (Python 3.8+)
MASSIVE Speed Upgrade
Normal multiprocessing copies all data via pickle.
| Approach | 100MB Array to 4 Workers | Memory Used |
|---|---|---|
| Normal (pickle) | ~2 seconds | 400MB (4 copies) |
| Shared memory | ~0.001 seconds | 100MB (1 shared) |
Without shared memory:
- ❌ Large NumPy arrays (100MB+) = slow
- ❌ ML tensors = slow
- ❌ Video frames = slow
Solution: shared memory
Used in:
- PyTorch multiprocessing
- TensorFlow data pipelines
- High-performance ETL
🚀 23-32. Expert-Level Parallel Patterns
Professional parallel computing patterns:
23. ML Tensor Preprocessing
Parallel normalization using shared memory
24. Fan-Out / Fan-In Architecture
Split work, collect results — backbone of scalable systems
25. Scatter/Gather Pattern
as_completed() for responsive systems
26. Fault-Tolerant Systems
Timeouts, retries, error-tolerant submission
27. Executor Isolation
Separate pools for IO, CPU, and orchestration
28. Backpressure & Throttling
Queue-based throttling to prevent overload
29. Map-Reduce Pattern
Parallel map, sequential reduce — Hadoop/Spark ancestor
30. Mini Distributed Engine Design
Task graph, scheduler, future tracking — Ray/Dask concepts
31. Building Distributed Executors
ProcessPool + sockets for multi-machine tasks
32. Master Hybrid Pipeline
AsyncIO orchestration + ThreadPool I/O + ProcessPool CPU
🚀 32. Master Hybrid Pipeline (Complete Example)
The ultimate architecture combining AsyncIO + Threads + Processes:
Master Hybrid Pipeline
The ultimate architecture used by Netflix, TikTok, and more
import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
async def master_pipeline(files):
loop = asyncio.get_event_loop()
io_pool = ThreadPoolExecutor(32)
cpu_pool = ProcessPoolExecutor(8)
# Stage 1 — Async orchestrates
file_contents = await asyncio.gather(*[
loop.run_in_executor(io_pool, open_file, f)
for f in files
])
# Stage 2 — CPU processing
processed = await asyncio.gather(*[
loop.run_in_executor(cpu_p
...This is the same architecture used by:
- ✔ Netflix data pipelines
- ✔ TikTok's ML recommendation ingest
- ✔ YouTube's batch processing
- ✔ OpenAI's internal preprocessors
🎉 Conclusion
You now understand ULTRA-ADVANCED concurrency and parallelism:
✔ ThreadPoolExecutor & ProcessPoolExecutor
✔ Futures and parallel execution
✔ Exception handling in parallel tasks
✔ Real-world I/O & CPU examples
✔ Behind-the-scenes executor internals
✔ Serialization optimization
✔ Batching and chunksize tuning
✔ Shared state management
✔ Zero-copy shared memory
✔ Tensor multiprocessing
✔ Fan-out/fan-in architecture
✔ Scatter/gather patterns
✔ Fault-tolerant systems
✔ Executor isolation
✔ Backpressure & throttling
✔ Map-reduce patterns
✔ Distributed systems concepts
✔ Hybrid Async/Thread/Process pipelines
This module is the foundation of high-performance Python systems — from ML pipelines to scalable backend services.
📋 Quick Reference — Parallelism
| Syntax | What it does |
|---|---|
| ThreadPoolExecutor(max_workers=4) | Pool for I/O-bound tasks |
| ProcessPoolExecutor(max_workers=4) | Pool for CPU-bound tasks |
| executor.submit(fn, arg) | Submit one task, returns Future |
| executor.map(fn, items) | Map function over iterable |
| as_completed(futures) | Iterate futures as they finish |
🎉 Great work! You've completed this lesson.
You can now use ThreadPoolExecutor and ProcessPoolExecutor to parallelize real work efficiently and safely.
Up next: Profiling — learn to measure and optimise Python performance like a senior engineer.
Sign up for free to track which lessons you've completed and get learning reminders.