Courses/Python/Concurrency: Threads vs Processes

    Lesson 22 • Advanced

    Concurrency in Python: Threads vs Processes

    Master Python's concurrency models and learn when to use threads, processes, or AsyncIO for maximum performance.

    What You'll Learn in This Lesson

    • • The difference between concurrency and parallelism — and why it matters
    • • How Python threads work and when the GIL limits you
    • • When to use threading vs multiprocessing vs asyncio
    • • How to share data safely between threads using locks and queues
    • • How to spawn and manage worker processes for CPU-bound tasks
    • • Real-world patterns: scrapers, batch processors, concurrent file downloads

    🔥 1. Why Concurrency Exists

    Computers run tasks in parallel using:

    • CPU cores → true parallelism (multiple chefs)
    • Scheduling → switching between tasks (one chef, many pots)
    • I/O waits → idle time you can use productively (waiting for water to boil)
    Type of WorkWhat It MeansBest SolutionReal Example
    CPU-boundHeavy calculations that keep the CPU busymultiprocessingImage processing, ML training
    I/O-boundWaiting for external resourcesthreads or asyncioAPI calls, file downloads

    ⚙️ 2. Threads in Python

    A thread is a lightweight unit of execution within a single process.

    What Threads ShareWhy It Matters
    MemoryFast communication, but risk of conflicts
    VariablesEasy data sharing, but need locks for safety
    File handlesCan work with same files simultaneously
    Python interpreterLimited by GIL for CPU work

    ✔ Good for:

    • network requests (waiting for servers)
    • reading/writing files (waiting for disk)
    • user interface responsiveness (don't freeze the UI)
    • downloading many URLs (lots of waiting)

    ❌ Not good for:

    • heavy computation (GIL blocks parallelism)
    • CPU-bound workload (use processes instead)

    Basic Threading

    Create and manage multiple threads

    Try it Yourself »
    Python
    import threading
    import time
    
    def task(name):
        # Each thread announces when it starts
        print(f"{name} starting")
        
        # Simulate I/O wait (like downloading a file)
        # During this sleep, other threads can run!
        time.sleep(0.5)
        
        print(f"{name} done")
    
    # Create 3 thread objects
    # Each thread will run the 'task' function with a different name
    threads = [
        threading.Thread(target=task, args=(f"Thread-{i}",))
        for i in range(3)
    ]
    
    # Start all threads (they begin running)
    
    ...

    🧠 3. Understanding the GIL (Global Interpreter Lock)

    The GIL (Global Interpreter Lock) is a mutex that protects access to Python objects.

    SituationGIL EffectResult
    Running Python codeGIL is heldOther threads wait
    Waiting for network/fileGIL is releasedOther threads can run
    Using C extensions (NumPy)Often releasedTrue parallelism possible

    This means:

    ❌ Threads cannot speed up CPU-heavy tasks

    (e.g., image processing, hashing, compression)

    ✔ Threads can hugely speed up I/O tasks

    (e.g., APIs, web scraping, DB queries)

    ⚡ 4. Processes in Python

    A process is a full Python interpreter with its own memory.

    AspectThreadsProcesses
    MemorySharedSeparate (isolated)
    GILShared (limits CPU work)Each has its own
    Startup timeFast (microseconds)Slow (milliseconds)
    Data sharingEasy (same memory)Requires serialization

    ✔ Advantages:

    • True parallelism (uses multiple CPU cores)
    • Great for CPU-bound work
    • No GIL problems — each process has its own

    ❌ Disadvantages:

    • More memory used (each process needs its own)
    • Slower to start (spawning a new Python interpreter)
    • Harder to share data (must serialize/deserialize)

    Basic multiprocessing:

    Basic Multiprocessing

    Run CPU-bound tasks in separate processes

    Try it Yourself »
    Python
    import multiprocessing
    import time
    
    def cpu_task(name):
        print(f"{name} starting")
        
        # CPU-intensive work: calculate sum of squares
        # This keeps the CPU busy (not waiting for I/O)
        total = sum(i * i for i in range(10_000_000))
        
        print(f"{name} done: {total}")
    
    # IMPORTANT: This guard is required on Windows!
    # Without it, the script would spawn infinite processes
    if __name__ == "__main__":
        # Create 2 process objects
        processes = [
            multiprocessing.Process(tar
    ...

    🔄 5. Side-by-Side Comparison

    FeatureThreadsProcesses
    Speed for I/O⭐⭐⭐⭐⭐⭐⭐⭐
    Speed for CPU⭐⭐⭐⭐⭐⭐⭐
    Memory usageLowHigh
    Startup timeFastSlow
    Shares memory?YesNo
    Avoids GIL?
    Best for?I/O tasksCPU tasks

    🧪 6. Real-World Examples

    ✔ Threads Example: Web Scraping

    Threaded Web Scraping

    Use threads to fetch multiple URLs in parallel

    Try it Yourself »
    Python
    import threading
    import requests
    import time
    
    def fetch_url(url):
        response = requests.get(url)
        print(f"Fetched {url}: {len(response.content)} bytes")
    
    urls = [
        "https://api.github.com/users/github",
        "https://api.github.com/users/microsoft",
        "https://api.github.com/users/google",
        "https://api.github.com/users/facebook"
    ]
    
    start = time.time()
    threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
    for t in threads:
        t.start()
    for t in threads:
        t.j
    ...

    Threads shine because requests are I/O-bound.

    ✔ Processes Example: Image Processing

    Process-Based Image Processing

    Use processes for CPU-heavy image tasks

    Try it Yourself »
    Python
    import multiprocessing
    from PIL import Image
    import time
    
    def process_image(filename):
        # CPU-heavy: resize, filter, transform
        img = Image.open(filename)
        img = img.resize((800, 600))
        img = img.convert('L')  # grayscale
        img.save(f"processed_{filename}")
        print(f"Processed {filename}")
    
    if __name__ == "__main__":
        images = ["img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
        
        start = time.time()
        processes = [
            multiprocessing.Process(target=process_image, ar
    ...

    CPU work → multiprocessing is ideal.

    🧱 7. Mixing Threads & Processes (Hybrid Model)

    Many real systems use both:

    • Processes for CPU-heavy pipelines
    • Threads for network/database operations
    • AsyncIO for massive lightweight tasks
    ComponentRole in Hybrid SystemWhy?
    AsyncIOOrchestra conductorLightweight coordination of thousands of tasks
    ThreadsI/O specialistsHandle blocking I/O without stopping AsyncIO
    ProcessesHeavy liftersCPU work on multiple cores simultaneously

    Example: A crawler using:

    • Threads → fetch 1,000 URLs
    • Processes → process each page's text
    • AsyncIO → coordinate flows

    This is how production-grade scrapers, ML preprocessors, and automation bots work.

    🔥 8. When Should You Use What?

    Use Threads when:

    • ✔ I/O-bound
    • ✔ Web scraping
    • ✔ Automation
    • ✔ Network servers
    • ✔ Waiting on APIs

    Use Processes when:

    • ✔ CPU-bound
    • ✔ ML preprocessing
    • ✔ Math-heavy operations
    • ✔ Hashing, encryption
    • ✔ Image/video processing

    Use AsyncIO when:

    • ✔ Massive concurrency
    • ✔ Lightweight operations
    • ✔ Network-first tasks
    • ✔ High scalability needed

    🧠 9. How Python Schedules Threads Internally (Advanced)

    Python uses cooperative + preemptive scheduling for threads.

    Here's what really happens:

    ✔ The OS schedules threads

    The operating system decides when each thread runs, based on CPU availability.

    ✔ The GIL schedules Python bytecode

    Inside Python, only ONE thread can execute Python bytecode at once.

    This creates:

    • artificial bottlenecks for CPU tasks
    • no bottlenecks for I/O (because threads release the GIL while waiting)

    How the GIL behaves:

    1. A thread starts running Python code
    2. When it hits I/O (network, file), it releases the GIL
    3. Another thread can run
    4. When I/O completes, the thread resumes and reacquires GIL

    This explains why:

    • ✔ 100 threads downloading files works great
    • ❌ 100 threads crunching numbers does NOT

    ⚡ 10. The True Strength of Threads: I/O Parallelism

    Let's say you need to download 10,000 images.

    Sequential time = 10,000 × (0.4 seconds each) = ~4000 seconds (1.1 hours)

    Threaded time (200 threads) = ~20 seconds total

    Why?

    Because threads wait most of the time, so Python overlaps waits.

    Common I/O sources:

    • HTTP requests
    • reading/writing files
    • database queries
    • cloud APIs
    • waiting for user input
    • network sockets
    • sensor data

    Threads shine because I/O releases the GIL.

    🔥 11. The True Strength of Processes: CPU Parallelism

    CPU-bound tasks include:

    • encryption
    • compression
    • ML preprocessing
    • hashing
    • physics simulation
    • number crunching
    • image/video filtering
    • audio processing

    If you run these in threadsNO speed improvement.

    If you run these in processes4× faster on 4 cores, 12× on 12 cores, etc.

    Processes use true multi-core hardware.

    🧬 12-18. Advanced Concurrency Topics

    The next sections cover professional-level concurrency patterns:

    12. Data Sharing Between Threads

    Lock, RLock, Event, Semaphore, Queue — Safe shared memory primitives

    13. Data Sharing Between Processes

    multiprocessing.Queue, Pipe, Manager, shared memory arrays

    14. ThreadPoolExecutor vs ProcessPoolExecutor

    concurrent.futures abstraction for both models

    15. Hybrid Concurrency Pattern

    Combining AsyncIO + Threads + Processes like YouTube/Instagram

    16. Avoiding Common Bugs

    Race conditions, deadlocks, blocking calls, serialization issues

    17. Real Benchmark

    CPU-heavy: 8× with processes; I/O-heavy: 50× with threads

    18. Decision Framework

    ✔ Threads for I/O | ✔ Processes for CPU | ✔ AsyncIO for massive concurrency

    🧨 19. Race Conditions — The Silent Killer

    A race condition happens when:

    • Two or more threads access shared data
    • AND at least one thread modifies it
    • AND execution order determines the final result
    SymptomWhat's HappeningExample
    Inconsistent resultsDifferent output each runCounter shows 987,432 instead of 1,000,000
    Lost updatesChanges disappearTwo users edit same record, one is lost
    Works sometimesTiming-dependent bugsPasses tests locally, fails in production

    Example — unsafe counter:

    Race Condition Demo

    See how unsynchronized threads cause incorrect results

    Try it Yourself »
    Python
    import threading
    
    counter = 0
    
    def increment():
        global counter
        for _ in range(100_000):
            counter += 1
    
    threads = [threading.Thread(target=increment) for _ in range(10)]
    [t.start() for t in threads]
    [t.join() for t in threads]
    
    print(counter)   # NEVER equals 1,000,000

    Why?

    counter += 1 is NOT atomic. It's 3 instructions:

    1. load counter
    2. add 1
    3. store result

    Two threads collide → corrupted values.

    🧱 20. Fixing Race Conditions With Locks

    Locks ensure only ONE thread accesses critical code at once.

    Lock MethodWhat It DoesWhen to Use
    lock.acquire()Grab the lock (waits if taken)Manual control needed
    lock.release()Release the lockAfter acquire()
    with lock:Auto acquire + release✅ Always prefer this!

    Thread-Safe Counter with Lock

    Use locks to prevent race conditions

    Try it Yourself »
    Python
    import threading
    
    lock = threading.Lock()
    counter = 0
    
    def increment():
        global counter
        for _ in range(100_000):
            with lock:
                counter += 1
    
    threads = [threading.Thread(target=increment) for _ in range(10)]
    [t.start() for t in threads]
    [t.join() for t in threads]
    
    print(counter)  # Now correctly equals 1,000,000

    Now:

    • ✔ Safe
    • ✔ Deterministic
    • ✔ Correct final value

    But… ❗ Locks introduce blocking, which could slow threads.

    🔒 21-32. Expert-Level Concurrency Patterns

    Advanced topics covered:

    21. RLock — Reentrant Locks

    Same thread can acquire lock multiple times

    22. Deadlocks

    When threads freeze forever waiting on each other

    23. Semaphores

    Controlling access to limited resources

    24. Events

    Fast thread signalling

    25. Conditions

    Thread coordination

    26. Shared Memory Between Processes

    Value, Array, Manager, shared_memory

    27. Serialization Issues

    What breaks when passing objects to processes

    28. ProcessPool Optimization

    chunksize, initializer, preloading data

    29. Real Architecture: High-Performance Scraper

    AsyncIO → ThreadPool → ProcessPool pipeline

    30. Real Architecture: ML Preprocessing

    ProcessPool for CPU, ThreadPool for I/O, AsyncIO for APIs

    31. Framework Internals

    How FastAPI, Scrapy, PyTorch use concurrency

    32. The Future: No-GIL Python

    Python 3.13+ will enable true parallel threads for CPU work

    🎓 Final Summary

    You now understand advanced concurrency:

    ✔ Threads vs Processes — when to use each

    ✔ The GIL and its impact

    ✔ Race conditions, Locks, RLocks

    ✔ Deadlocks & prevention strategies

    ✔ Semaphores & resource limits

    ✔ Events & Conditions for coordination

    ✔ Shared memory across processes

    ✔ ThreadPoolExecutor & ProcessPoolExecutor

    ✔ Real engineering architectures

    ✔ Thread/AsyncIO/Process hybrid design

    ✔ Framework-level concurrency models

    ✔ Production optimization techniques

    You're operating at professional backend engineer level now.

    📋 Quick Reference — Concurrency

    ToolBest for
    threading.ThreadI/O-bound tasks, network calls
    multiprocessing.ProcessCPU-bound tasks (bypasses GIL)
    threading.Lock()Protect shared state between threads
    queue.Queue()Thread-safe data passing
    multiprocessing.Queue()Process-safe data passing

    🎉 Great work! You've completed this lesson.

    You now understand the GIL, when to use threads vs processes, and how to safely share data between concurrent workers.

    Up next: Parallelism — use concurrent.futures for a clean high-level API over threads and processes.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service