Courses/Python/Parallelism with concurrent.futures

Lesson 23 • Advanced

Parallelism with concurrent.futures

Master Python's concurrent.futures module to build high-performance parallel systems using ThreadPoolExecutor and ProcessPoolExecutor.

Note: Parallelism features in this lesson require Python installed on your computer (not browser-based). Download Python here. Keep the website open and test our code examples with Python running on your computer.

What You'll Learn in This Lesson

• How concurrent.futures abstracts threads and processes into a unified API
• When to use ThreadPoolExecutor vs ProcessPoolExecutor
• How Futures represent pending work — and how to collect results
• Running parallel tasks with map() and submit()
• Handling errors, cancellations, and timeouts in parallel jobs
• Real patterns: parallel image processing, batch API calls, data pipelines

1. What concurrent.futures Actually Does

🎯 Real-World Analogy: Think of concurrent.futures like a task management app. You submit tasks (like "download this file" or "process this image"), and it assigns workers to complete them. You get a "ticket" (Future) that you can check later to see if your task is done.

It provides two executor types:

Executor	Workers Are	Best For	Examples
ThreadPoolExecutor	Threads	I/O-bound tasks (waiting)	API calls, file downloads, DB queries
ProcessPoolExecutor	Processes	CPU-bound tasks (thinking)	Image processing, ML prep, heavy math

💡 Key Concept: Both executors use Futures — objects representing work that has started but not finished. Like a restaurant buzzer that vibrates when your order is ready!

⚙️ 2. The Basic Pattern (Threads)

🎫 Real-World Analogy: Using an executor is like ordering at a coffee shop. You submit() your order, get a receipt (Future), and can do other things. When you call .result(), you wait at the counter until your coffee is ready.

Basic ThreadPoolExecutor

Submit tasks to a thread pool and get results

Try it Yourself »

Python

from concurrent.futures import ThreadPoolExecutor
import time

def download(url):
    # Simulate a network request that takes time
    time.sleep(1)
    return f"Downloaded {url}"

# Create a pool with 4 worker threads
# Think of it as hiring 4 assistants
executor = ThreadPoolExecutor(max_workers=4)

# Submit a task - returns immediately with a Future
# The task runs in the background
future = executor.submit(download, "https://example.com")
print("Task submitted - we can do other work here!")


...

Method	What It Does	Returns
`submit(fn, *args)`	Schedules function to run in background	Future object
`future.result()`	Waits for and returns the result	Function's return value
`shutdown()`	Releases worker threads	None

✅ Pro Tip: This is the simplest form of parallelism. For most use cases, you'll want to use the with statement (context manager) for automatic cleanup!

⚡ 3. Running Many Tasks at Once

🍕 Real-World Analogy: Using executor.map() is like a pizza shop with multiple ovens. You give them a list of pizzas to make, they cook them in parallel using all available ovens, and return them to you in the same order you requested.

Parallel Map with Threads

Process multiple items in parallel using executor.map()

Try it Yourself »

Python

from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    # simulate work - in real code this would be an HTTP request
    return f"Content from {url}"

urls = ["url1", "url2", "url3", "url4"]

# The 'with' statement ensures automatic cleanup
# No need to call executor.shutdown() manually!
with ThreadPoolExecutor(max_workers=4) as executor:
    # map() is like the built-in map(), but parallel!
    # It automatically:
    # 1. Submits all tasks
    # 2. Collects results in order
    
...

Approach	When to Use	Returns
`submit()`	Different functions, custom handling	Individual Futures
`map()`	Same function, many inputs	Iterator of results (in order)

💡 Key Insight: map() preserves order! Even if url4 finishes first, you'll still get results in the order [url1, url2, url3, url4].

4. CPU Parallelism with ProcessPoolExecutor

🏭 Real-World Analogy: ProcessPoolExecutor is like having multiple factories. Each factory (process) has its own workers, equipment, and power supply. They can truly work at the same time because they're completely independent — perfect for heavy manufacturing (CPU work).

CPU-Bound Processing

Use ProcessPoolExecutor for true multi-core parallelism

Try it Yourself »

Python

from concurrent.futures import ProcessPoolExecutor

def heavy_compute(n):
    # CPU-intensive work: sum of squares
    # This keeps the CPU busy (no waiting)
    return sum(i * i for i in range(n))

numbers = [10_000_000, 10_000_000, 10_000_000, 10_000_000]

# ProcessPoolExecutor uses separate processes, not threads
# Each process gets its own Python interpreter and GIL
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(heavy_compute, numbers))

print(results)
#
...

Executor	Memory	GIL	CPU Usage
ThreadPoolExecutor	Shared	One GIL (blocks CPU work)	1 core effective
ProcessPoolExecutor	Separate	Each has own GIL	All cores!

✅ This is true multi-core use! Each worker has its own Python interpreter, bypasses the GIL, and runs independently on a separate CPU core.

🧩 5. Futures — Understanding the Object

🎟️ Real-World Analogy: A Future is like a coat check ticket. When you drop off your coat (submit a task), you get a ticket. The ticket doesn't contain your coat — it's a promise that you can retrieve it later. You can check if it's ready, wait for it, or even cancel it!

A Future represents a pending operation. It can be in different states:

State	Meaning	Check With
Running	Task is currently being executed	`future.running()`
Done	Task completed (success or error)	`future.done()`
Cancelled	Task was cancelled before running	`future.cancelled()`

Working with Futures

Inspect future state and retrieve results

Try it Yourself »

Python

from concurrent.futures import ThreadPoolExecutor
import time

def task(n):
    # Simulate work that takes 'n' seconds
    time.sleep(n)
    return n * 2

executor = ThreadPoolExecutor()

# Submit a task that takes 2 seconds
future = executor.submit(task, 2)

# Check state immediately (task is running)
print(f"Done? {future.done()}")      # False
print(f"Running? {future.running()}")  # True

# Wait for result (blocks until complete)
result = future.result()
print(f"Result: {result}")  # 4

# No
...

💡 Key Insight: Futures let you start work, do other things, and collect results when ready. This is the foundation of how large async systems work — from web servers to ML pipelines!

🌀 6. Handling Exceptions in Parallel Tasks

🎯 Real-World Analogy: When a worker in your factory has a problem, they don't shout across the building — they write it on your task ticket. When you check your ticket (.result()), you'll see the problem and can handle it appropriately.

If a function raises an exception inside a worker, .result() will re-raise it in the main thread:

Exception Handling in Futures

Handle errors from parallel tasks gracefully

Try it Yourself »

Python

from concurrent.futures import ThreadPoolExecutor

def failing_task():
    # This exception happens in a worker thread
    raise ValueError("Something went wrong")

executor = ThreadPoolExecutor()
future = executor.submit(failing_task)

# The exception is "stored" in the Future
# When we call .result(), it's re-raised here
try:
    future.result()
except ValueError as e:
    print(f"Caught: {e}")
# Output: Caught: Something went wrong

Method	On Success	On Error
`future.result()`	Returns the value	Raises the exception
`future.exception()`	Returns None	Returns the exception object

✅ Best Practice: This makes error handling predictable — wrap .result()calls in try/except just like you would with any other code that might fail.

7. Real-World Example — Parallel Web Requests

🌐 Real-World Analogy: Fetching multiple URLs sequentially is like ordering food at 10 restaurants one by one — you wait at each. Parallel fetching is like placing 10 orders simultaneously via delivery apps — all start cooking at once!

Parallel Web Requests

Fetch multiple URLs concurrently

Try it Yourself »

Python

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_url(url):
    response = requests.get(url)
    return len(response.content)

urls = [
    "https://api.github.com/users/github",
    "https://api.github.com/users/microsoft",
    "https://api.github.com/users/google"
]

with ThreadPoolExecutor(max_workers=10) as executor:
    sizes = list(executor.map(fetch_url, urls))

print(f"Total bytes: {sum(sizes)}")

Approach	Time for 100 URLs (0.3s each)	Speedup
Sequential (one by one)	30 seconds	1x (baseline)
ThreadPool (10 workers)	~3 seconds	~10x faster!

Useful for:

scraping
API pipelines
ETL ingestion

💡 Pro Tip: Set max_workers based on your use case. For web requests, 10-50 workers is common. Too many might overwhelm the server or trigger rate limits!

⚡ 8. Real-World Example — CPU Parallel Data Processing

Parallel Password Hashing

Use all CPU cores for heavy computation

Try it Yourself »

Python

from concurrent.futures import ProcessPoolExecutor
import hashlib

def hash_password(password):
    return hashlib.pbkdf2_hmac('sha256', 
                                 password.encode(), 
                                 b'salt', 
                                 100000).hex()

passwords = ["pass123", "secret", "mypass"] * 1000

with ProcessPoolExecutor() as executor:
    hashed = list(executor.map(hash_password, passwords))

print(f"Hashed {len(hashed)} passwords")

This pattern powers:

AI data prep
big dataset cleaning
batch computation services

🔄 9. Mixing Concurrency & Parallelism

🎭 Real-World Analogy: A movie production uses different teams for different jobs: runners (threads) fetch props and costumes — lots of waiting around.VFX artists (processes) do heavy rendering — pure CPU work. Thedirector (asyncio) coordinates everyone without doing the heavy lifting themselves.

For the best performance, systems combine:

I/O parallelism (threads)
CPU parallelism (processes)
Task scheduling (asyncio)

Pipeline Stage	Best Tool	Why
Download data	ThreadPoolExecutor	I/O-bound: waiting for servers
Parse/transform data	ProcessPoolExecutor	CPU-bound: heavy computation
Coordinate/schedule	asyncio	Lightweight: manage task flow

For example:

asyncio → orchestrates
ThreadPoolExecutor → handles blocking file/network
ProcessPoolExecutor → handles heavy CPU tasks

This is how modern Python backends (FastAPI, aiohttp) achieve massive throughput.

📦 10. Choosing the Right Executor

Scenario	Best Choice
Many API calls	ThreadPoolExecutor
Downloading files	ThreadPoolExecutor
Reading thousands of files	ThreadPoolExecutor
Image processing	ProcessPoolExecutor
ML preprocessing	ProcessPoolExecutor
Large math loops	ProcessPoolExecutor
ETL pipelines	Both (mixed)

If your task is waiting, use threads.

If your task is thinking, use processes.

🧩 11-20. Behind the Scenes & Advanced Patterns

Advanced topics covered:

11. How Executors Work

Task queue, worker threads/processes, IPC mechanisms

12. Serialization Overhead

Pickling costs, lambda limitations, using top-level functions

13. Batching Large Jobs

Grouping tasks for efficiency, reducing pickling overhead

14. chunksize Tuning

Optimizing executor.map() with chunksize parameter

15. Managing Shared State

multiprocessing.Manager, Queue, shared_memory

16. Thread Starvation

Preventing long tasks from blocking the pool

17. Avoiding Deadlocks

Never call .result() inside worker tasks

18. Async Execution

Running futures without blocking main thread

19. Thread Affinity

Pinning threads to CPU cores for consistent latency

20. Hybrid Pipeline Architecture

AsyncIO → ThreadPool → ProcessPool → AsyncIO pattern

🧪 21. Full Real-World Example — Data ETL Pipeline

Full ETL Pipeline

Combine asyncio, threads, and processes for a real-world pipeline

Try it Yourself »

Python

import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

io_pool = ThreadPoolExecutor(20)
cpu_pool = ProcessPoolExecutor(8)

def read_file(path):
    with open(path) as f:
        return f.read()

def transform(text):
    return text.upper()

async def main(files):
    loop = asyncio.get_event_loop()

    # Stage 1: Read files (I/O bound)
    contents = await asyncio.gather(*[
        loop.run_in_executor(io_pool, read_file, f)
        for f in files
    ])

    # S
...

This is TRUE professional pipeline architecture.

🧬 22. Zero-Copy Shared Memory (Python 3.8+)

📋 Real-World Analogy: Normal multiprocessing is like photocopying a 1000-page document for each colleague who needs it — slow and wasteful. Shared memory is like putting one copy on a shared desk where everyone can read it simultaneously — instant access!

MASSIVE Speed Upgrade

Normal multiprocessing copies all data via pickle.

Approach	100MB Array to 4 Workers	Memory Used
Normal (pickle)	~2 seconds	400MB (4 copies)
Shared memory	~0.001 seconds	100MB (1 shared)

Without shared memory:

❌ Large NumPy arrays (100MB+) = slow
❌ ML tensors = slow
❌ Video frames = slow

Solution: shared memory

Zero-Copy Shared Memory

Share large arrays between processes without copying

Try it Yourself »

Python

from multiprocessing import shared_memory
import numpy as np

# Create shared array
data = np.arange(10_000_000, dtype=np.int32)
shm = shared_memory.SharedMemory(create=True, size=data.nbytes)

# Write to shared block
shared_arr = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shared_arr[:] = data[:]

# Now multiple workers can read the same data:
# ✔ Zero copy
# ✔ Zero serialization
# ✔ Lightning-fast ML preprocessing

Used in:

PyTorch multiprocessing
TensorFlow data pipelines
High-performance ETL

✅ When to Use: Whenever you're passing large data (arrays, images, tensors) to multiple processes. The bigger the data, the bigger the speedup!

🚀 23-32. Expert-Level Parallel Patterns

Professional parallel computing patterns:

23. ML Tensor Preprocessing

Parallel normalization using shared memory

24. Fan-Out / Fan-In Architecture

Split work, collect results — backbone of scalable systems

25. Scatter/Gather Pattern

as_completed() for responsive systems

26. Fault-Tolerant Systems

Timeouts, retries, error-tolerant submission

27. Executor Isolation

Separate pools for IO, CPU, and orchestration

28. Backpressure & Throttling

Queue-based throttling to prevent overload

29. Map-Reduce Pattern

Parallel map, sequential reduce — Hadoop/Spark ancestor

30. Mini Distributed Engine Design

Task graph, scheduler, future tracking — Ray/Dask concepts

31. Building Distributed Executors

ProcessPool + sockets for multi-machine tasks

32. Master Hybrid Pipeline

AsyncIO orchestration + ThreadPool I/O + ProcessPool CPU

🚀 32. Master Hybrid Pipeline (Complete Example)

The ultimate architecture combining AsyncIO + Threads + Processes:

Master Hybrid Pipeline

The ultimate architecture used by Netflix, TikTok, and more

Try it Yourself »

Python

import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

async def master_pipeline(files):
    loop = asyncio.get_event_loop()
    io_pool = ThreadPoolExecutor(32)
    cpu_pool = ProcessPoolExecutor(8)

    # Stage 1 — Async orchestrates
    file_contents = await asyncio.gather(*[
        loop.run_in_executor(io_pool, open_file, f)
        for f in files
    ])

    # Stage 2 — CPU processing
    processed = await asyncio.gather(*[
        loop.run_in_executor(cpu_p
...

This is the same architecture used by:

✔ Netflix data pipelines
✔ TikTok's ML recommendation ingest
✔ YouTube's batch processing
✔ OpenAI's internal preprocessors

🎉 Conclusion

You now understand ULTRA-ADVANCED concurrency and parallelism:

✔ ThreadPoolExecutor & ProcessPoolExecutor

✔ Futures and parallel execution

✔ Exception handling in parallel tasks

✔ Real-world I/O & CPU examples

✔ Behind-the-scenes executor internals

✔ Serialization optimization

✔ Batching and chunksize tuning

✔ Shared state management

✔ Zero-copy shared memory

✔ Tensor multiprocessing

✔ Fan-out/fan-in architecture

✔ Scatter/gather patterns

✔ Fault-tolerant systems

✔ Executor isolation

✔ Backpressure & throttling

✔ Map-reduce patterns

✔ Distributed systems concepts

✔ Hybrid Async/Thread/Process pipelines

This module is the foundation of high-performance Python systems — from ML pipelines to scalable backend services.

📋 Quick Reference — Parallelism

Syntax	What it does
ThreadPoolExecutor(max_workers=4)	Pool for I/O-bound tasks
ProcessPoolExecutor(max_workers=4)	Pool for CPU-bound tasks
executor.submit(fn, arg)	Submit one task, returns Future
executor.map(fn, items)	Map function over iterable
as_completed(futures)	Iterate futures as they finish

🎉 Great work! You've completed this lesson.

You can now use ThreadPoolExecutor and ProcessPoolExecutor to parallelize real work efficiently and safely.

Up next: Profiling — learn to measure and optimise Python performance like a senior engineer.

Parallelism with concurrent.futures

What You'll Learn in This Lesson

1. What concurrent.futures Actually Does

⚙️ 2. The Basic Pattern (Threads)

Basic ThreadPoolExecutor

⚡ 3. Running Many Tasks at Once

Parallel Map with Threads

4. CPU Parallelism with ProcessPoolExecutor

CPU-Bound Processing

🧩 5. Futures — Understanding the Object

Working with Futures

🌀 6. Handling Exceptions in Parallel Tasks

Exception Handling in Futures

7. Real-World Example — Parallel Web Requests

Parallel Web Requests

⚡ 8. Real-World Example — CPU Parallel Data Processing

Parallel Password Hashing

🔄 9. Mixing Concurrency & Parallelism

📦 10. Choosing the Right Executor

🧩 11-20. Behind the Scenes & Advanced Patterns

🧪 21. Full Real-World Example — Data ETL Pipeline

Full ETL Pipeline

🧬 22. Zero-Copy Shared Memory (Python 3.8+)

Zero-Copy Shared Memory

🚀 23-32. Expert-Level Parallel Patterns

🚀 32. Master Hybrid Pipeline (Complete Example)

Master Hybrid Pipeline

🎉 Conclusion

📋 Quick Reference — Parallelism

Cookie & Privacy Settings