Courses/Python/Parallelism with concurrent.futures

    Lesson 23 • Advanced

    Parallelism with concurrent.futures

    Master Python's concurrent.futures module to build high-performance parallel systems using ThreadPoolExecutor and ProcessPoolExecutor.

    What You'll Learn in This Lesson

    • • How concurrent.futures abstracts threads and processes into a unified API
    • • When to use ThreadPoolExecutor vs ProcessPoolExecutor
    • • How Futures represent pending work — and how to collect results
    • • Running parallel tasks with map() and submit()
    • • Handling errors, cancellations, and timeouts in parallel jobs
    • • Real patterns: parallel image processing, batch API calls, data pipelines

    1. What concurrent.futures Actually Does

    It provides two executor types:

    ExecutorWorkers AreBest ForExamples
    ThreadPoolExecutorThreadsI/O-bound tasks (waiting)API calls, file downloads, DB queries
    ProcessPoolExecutorProcessesCPU-bound tasks (thinking)Image processing, ML prep, heavy math

    ⚙️ 2. The Basic Pattern (Threads)

    Basic ThreadPoolExecutor

    Submit tasks to a thread pool and get results

    Try it Yourself »
    Python
    from concurrent.futures import ThreadPoolExecutor
    import time
    
    def download(url):
        # Simulate a network request that takes time
        time.sleep(1)
        return f"Downloaded {url}"
    
    # Create a pool with 4 worker threads
    # Think of it as hiring 4 assistants
    executor = ThreadPoolExecutor(max_workers=4)
    
    # Submit a task - returns immediately with a Future
    # The task runs in the background
    future = executor.submit(download, "https://example.com")
    print("Task submitted - we can do other work here!")
    
    
    ...
    MethodWhat It DoesReturns
    submit(fn, *args)Schedules function to run in backgroundFuture object
    future.result()Waits for and returns the resultFunction's return value
    shutdown()Releases worker threadsNone

    ⚡ 3. Running Many Tasks at Once

    Parallel Map with Threads

    Process multiple items in parallel using executor.map()

    Try it Yourself »
    Python
    from concurrent.futures import ThreadPoolExecutor
    
    def fetch(url):
        # simulate work - in real code this would be an HTTP request
        return f"Content from {url}"
    
    urls = ["url1", "url2", "url3", "url4"]
    
    # The 'with' statement ensures automatic cleanup
    # No need to call executor.shutdown() manually!
    with ThreadPoolExecutor(max_workers=4) as executor:
        # map() is like the built-in map(), but parallel!
        # It automatically:
        # 1. Submits all tasks
        # 2. Collects results in order
        
    ...
    ApproachWhen to UseReturns
    submit()Different functions, custom handlingIndividual Futures
    map()Same function, many inputsIterator of results (in order)

    4. CPU Parallelism with ProcessPoolExecutor

    CPU-Bound Processing

    Use ProcessPoolExecutor for true multi-core parallelism

    Try it Yourself »
    Python
    from concurrent.futures import ProcessPoolExecutor
    
    def heavy_compute(n):
        # CPU-intensive work: sum of squares
        # This keeps the CPU busy (no waiting)
        return sum(i * i for i in range(n))
    
    numbers = [10_000_000, 10_000_000, 10_000_000, 10_000_000]
    
    # ProcessPoolExecutor uses separate processes, not threads
    # Each process gets its own Python interpreter and GIL
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(heavy_compute, numbers))
    
    print(results)
    #
    ...
    ExecutorMemoryGILCPU Usage
    ThreadPoolExecutorSharedOne GIL (blocks CPU work)1 core effective
    ProcessPoolExecutorSeparateEach has own GILAll cores!

    🧩 5. Futures — Understanding the Object

    A Future represents a pending operation. It can be in different states:

    StateMeaningCheck With
    RunningTask is currently being executedfuture.running()
    DoneTask completed (success or error)future.done()
    CancelledTask was cancelled before runningfuture.cancelled()

    Working with Futures

    Inspect future state and retrieve results

    Try it Yourself »
    Python
    from concurrent.futures import ThreadPoolExecutor
    import time
    
    def task(n):
        # Simulate work that takes 'n' seconds
        time.sleep(n)
        return n * 2
    
    executor = ThreadPoolExecutor()
    
    # Submit a task that takes 2 seconds
    future = executor.submit(task, 2)
    
    # Check state immediately (task is running)
    print(f"Done? {future.done()}")      # False
    print(f"Running? {future.running()}")  # True
    
    # Wait for result (blocks until complete)
    result = future.result()
    print(f"Result: {result}")  # 4
    
    # No
    ...

    🌀 6. Handling Exceptions in Parallel Tasks

    If a function raises an exception inside a worker, .result() will re-raise it in the main thread:

    Exception Handling in Futures

    Handle errors from parallel tasks gracefully

    Try it Yourself »
    Python
    from concurrent.futures import ThreadPoolExecutor
    
    def failing_task():
        # This exception happens in a worker thread
        raise ValueError("Something went wrong")
    
    executor = ThreadPoolExecutor()
    future = executor.submit(failing_task)
    
    # The exception is "stored" in the Future
    # When we call .result(), it's re-raised here
    try:
        future.result()
    except ValueError as e:
        print(f"Caught: {e}")
    # Output: Caught: Something went wrong
    MethodOn SuccessOn Error
    future.result()Returns the valueRaises the exception
    future.exception()Returns NoneReturns the exception object

    7. Real-World Example — Parallel Web Requests

    Parallel Web Requests

    Fetch multiple URLs concurrently

    Try it Yourself »
    Python
    from concurrent.futures import ThreadPoolExecutor
    import requests
    
    def fetch_url(url):
        response = requests.get(url)
        return len(response.content)
    
    urls = [
        "https://api.github.com/users/github",
        "https://api.github.com/users/microsoft",
        "https://api.github.com/users/google"
    ]
    
    with ThreadPoolExecutor(max_workers=10) as executor:
        sizes = list(executor.map(fetch_url, urls))
    
    print(f"Total bytes: {sum(sizes)}")
    ApproachTime for 100 URLs (0.3s each)Speedup
    Sequential (one by one)30 seconds1x (baseline)
    ThreadPool (10 workers)~3 seconds~10x faster!

    Useful for:

    • scraping
    • API pipelines
    • ETL ingestion

    ⚡ 8. Real-World Example — CPU Parallel Data Processing

    Parallel Password Hashing

    Use all CPU cores for heavy computation

    Try it Yourself »
    Python
    from concurrent.futures import ProcessPoolExecutor
    import hashlib
    
    def hash_password(password):
        return hashlib.pbkdf2_hmac('sha256', 
                                     password.encode(), 
                                     b'salt', 
                                     100000).hex()
    
    passwords = ["pass123", "secret", "mypass"] * 1000
    
    with ProcessPoolExecutor() as executor:
        hashed = list(executor.map(hash_password, passwords))
    
    print(f"Hashed {len(hashed)} passwords")

    This pattern powers:

    • AI data prep
    • big dataset cleaning
    • batch computation services

    🔄 9. Mixing Concurrency & Parallelism

    For the best performance, systems combine:

    • I/O parallelism (threads)
    • CPU parallelism (processes)
    • Task scheduling (asyncio)
    Pipeline StageBest ToolWhy
    Download dataThreadPoolExecutorI/O-bound: waiting for servers
    Parse/transform dataProcessPoolExecutorCPU-bound: heavy computation
    Coordinate/scheduleasyncioLightweight: manage task flow

    For example:

    • asyncio → orchestrates
    • ThreadPoolExecutor → handles blocking file/network
    • ProcessPoolExecutor → handles heavy CPU tasks

    This is how modern Python backends (FastAPI, aiohttp) achieve massive throughput.

    📦 10. Choosing the Right Executor

    ScenarioBest Choice
    Many API callsThreadPoolExecutor
    Downloading filesThreadPoolExecutor
    Reading thousands of filesThreadPoolExecutor
    Image processingProcessPoolExecutor
    ML preprocessingProcessPoolExecutor
    Large math loopsProcessPoolExecutor
    ETL pipelinesBoth (mixed)

    If your task is waiting, use threads.

    If your task is thinking, use processes.

    🧩 11-20. Behind the Scenes & Advanced Patterns

    Advanced topics covered:

    11. How Executors Work

    Task queue, worker threads/processes, IPC mechanisms

    12. Serialization Overhead

    Pickling costs, lambda limitations, using top-level functions

    13. Batching Large Jobs

    Grouping tasks for efficiency, reducing pickling overhead

    14. chunksize Tuning

    Optimizing executor.map() with chunksize parameter

    15. Managing Shared State

    multiprocessing.Manager, Queue, shared_memory

    16. Thread Starvation

    Preventing long tasks from blocking the pool

    17. Avoiding Deadlocks

    Never call .result() inside worker tasks

    18. Async Execution

    Running futures without blocking main thread

    19. Thread Affinity

    Pinning threads to CPU cores for consistent latency

    20. Hybrid Pipeline Architecture

    AsyncIO → ThreadPool → ProcessPool → AsyncIO pattern

    🧪 21. Full Real-World Example — Data ETL Pipeline

    Full ETL Pipeline

    Combine asyncio, threads, and processes for a real-world pipeline

    Try it Yourself »
    Python
    import asyncio
    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    
    io_pool = ThreadPoolExecutor(20)
    cpu_pool = ProcessPoolExecutor(8)
    
    def read_file(path):
        with open(path) as f:
            return f.read()
    
    def transform(text):
        return text.upper()
    
    async def main(files):
        loop = asyncio.get_event_loop()
    
        # Stage 1: Read files (I/O bound)
        contents = await asyncio.gather(*[
            loop.run_in_executor(io_pool, read_file, f)
            for f in files
        ])
    
        # S
    ...

    This is TRUE professional pipeline architecture.

    🧬 22. Zero-Copy Shared Memory (Python 3.8+)

    MASSIVE Speed Upgrade

    Normal multiprocessing copies all data via pickle.

    Approach100MB Array to 4 WorkersMemory Used
    Normal (pickle)~2 seconds400MB (4 copies)
    Shared memory~0.001 seconds100MB (1 shared)

    Without shared memory:

    • ❌ Large NumPy arrays (100MB+) = slow
    • ❌ ML tensors = slow
    • ❌ Video frames = slow

    Solution: shared memory

    Zero-Copy Shared Memory

    Share large arrays between processes without copying

    Try it Yourself »
    Python
    from multiprocessing import shared_memory
    import numpy as np
    
    # Create shared array
    data = np.arange(10_000_000, dtype=np.int32)
    shm = shared_memory.SharedMemory(create=True, size=data.nbytes)
    
    # Write to shared block
    shared_arr = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
    shared_arr[:] = data[:]
    
    # Now multiple workers can read the same data:
    # ✔ Zero copy
    # ✔ Zero serialization
    # ✔ Lightning-fast ML preprocessing

    Used in:

    • PyTorch multiprocessing
    • TensorFlow data pipelines
    • High-performance ETL

    🚀 23-32. Expert-Level Parallel Patterns

    Professional parallel computing patterns:

    23. ML Tensor Preprocessing

    Parallel normalization using shared memory

    24. Fan-Out / Fan-In Architecture

    Split work, collect results — backbone of scalable systems

    25. Scatter/Gather Pattern

    as_completed() for responsive systems

    26. Fault-Tolerant Systems

    Timeouts, retries, error-tolerant submission

    27. Executor Isolation

    Separate pools for IO, CPU, and orchestration

    28. Backpressure & Throttling

    Queue-based throttling to prevent overload

    29. Map-Reduce Pattern

    Parallel map, sequential reduce — Hadoop/Spark ancestor

    30. Mini Distributed Engine Design

    Task graph, scheduler, future tracking — Ray/Dask concepts

    31. Building Distributed Executors

    ProcessPool + sockets for multi-machine tasks

    32. Master Hybrid Pipeline

    AsyncIO orchestration + ThreadPool I/O + ProcessPool CPU

    🚀 32. Master Hybrid Pipeline (Complete Example)

    The ultimate architecture combining AsyncIO + Threads + Processes:

    Master Hybrid Pipeline

    The ultimate architecture used by Netflix, TikTok, and more

    Try it Yourself »
    Python
    import asyncio
    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    
    async def master_pipeline(files):
        loop = asyncio.get_event_loop()
        io_pool = ThreadPoolExecutor(32)
        cpu_pool = ProcessPoolExecutor(8)
    
        # Stage 1 — Async orchestrates
        file_contents = await asyncio.gather(*[
            loop.run_in_executor(io_pool, open_file, f)
            for f in files
        ])
    
        # Stage 2 — CPU processing
        processed = await asyncio.gather(*[
            loop.run_in_executor(cpu_p
    ...

    This is the same architecture used by:

    • ✔ Netflix data pipelines
    • ✔ TikTok's ML recommendation ingest
    • ✔ YouTube's batch processing
    • ✔ OpenAI's internal preprocessors

    🎉 Conclusion

    You now understand ULTRA-ADVANCED concurrency and parallelism:

    ✔ ThreadPoolExecutor & ProcessPoolExecutor

    ✔ Futures and parallel execution

    ✔ Exception handling in parallel tasks

    ✔ Real-world I/O & CPU examples

    ✔ Behind-the-scenes executor internals

    ✔ Serialization optimization

    ✔ Batching and chunksize tuning

    ✔ Shared state management

    ✔ Zero-copy shared memory

    ✔ Tensor multiprocessing

    ✔ Fan-out/fan-in architecture

    ✔ Scatter/gather patterns

    ✔ Fault-tolerant systems

    ✔ Executor isolation

    ✔ Backpressure & throttling

    ✔ Map-reduce patterns

    ✔ Distributed systems concepts

    ✔ Hybrid Async/Thread/Process pipelines

    This module is the foundation of high-performance Python systems — from ML pipelines to scalable backend services.

    📋 Quick Reference — Parallelism

    SyntaxWhat it does
    ThreadPoolExecutor(max_workers=4)Pool for I/O-bound tasks
    ProcessPoolExecutor(max_workers=4)Pool for CPU-bound tasks
    executor.submit(fn, arg)Submit one task, returns Future
    executor.map(fn, items)Map function over iterable
    as_completed(futures)Iterate futures as they finish

    🎉 Great work! You've completed this lesson.

    You can now use ThreadPoolExecutor and ProcessPoolExecutor to parallelize real work efficiently and safely.

    Up next: Profiling — learn to measure and optimise Python performance like a senior engineer.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service