Lesson 22 • Advanced
Concurrency in Python: Threads vs Processes
Master Python's concurrency models and learn when to use threads, processes, or AsyncIO for maximum performance.
What You'll Learn in This Lesson
- • The difference between concurrency and parallelism — and why it matters
- • How Python threads work and when the GIL limits you
- • When to use
threadingvsmultiprocessingvsasyncio - • How to share data safely between threads using locks and queues
- • How to spawn and manage worker processes for CPU-bound tasks
- • Real-world patterns: scrapers, batch processors, concurrent file downloads
🔥 1. Why Concurrency Exists
Computers run tasks in parallel using:
- CPU cores → true parallelism (multiple chefs)
- Scheduling → switching between tasks (one chef, many pots)
- I/O waits → idle time you can use productively (waiting for water to boil)
| Type of Work | What It Means | Best Solution | Real Example |
|---|---|---|---|
| CPU-bound | Heavy calculations that keep the CPU busy | multiprocessing | Image processing, ML training |
| I/O-bound | Waiting for external resources | threads or asyncio | API calls, file downloads |
⚙️ 2. Threads in Python
A thread is a lightweight unit of execution within a single process.
| What Threads Share | Why It Matters |
|---|---|
| Memory | Fast communication, but risk of conflicts |
| Variables | Easy data sharing, but need locks for safety |
| File handles | Can work with same files simultaneously |
| Python interpreter | Limited by GIL for CPU work |
✔ Good for:
- network requests (waiting for servers)
- reading/writing files (waiting for disk)
- user interface responsiveness (don't freeze the UI)
- downloading many URLs (lots of waiting)
❌ Not good for:
- heavy computation (GIL blocks parallelism)
- CPU-bound workload (use processes instead)
Basic Threading
Create and manage multiple threads
import threading
import time
def task(name):
# Each thread announces when it starts
print(f"{name} starting")
# Simulate I/O wait (like downloading a file)
# During this sleep, other threads can run!
time.sleep(0.5)
print(f"{name} done")
# Create 3 thread objects
# Each thread will run the 'task' function with a different name
threads = [
threading.Thread(target=task, args=(f"Thread-{i}",))
for i in range(3)
]
# Start all threads (they begin running)
...🧠 3. Understanding the GIL (Global Interpreter Lock)
The GIL (Global Interpreter Lock) is a mutex that protects access to Python objects.
| Situation | GIL Effect | Result |
|---|---|---|
| Running Python code | GIL is held | Other threads wait |
| Waiting for network/file | GIL is released | Other threads can run |
| Using C extensions (NumPy) | Often released | True parallelism possible |
This means:
❌ Threads cannot speed up CPU-heavy tasks
(e.g., image processing, hashing, compression)
✔ Threads can hugely speed up I/O tasks
(e.g., APIs, web scraping, DB queries)
multiprocessing. Each process has its own Python interpreter and its own GIL, enabling true parallelism!⚡ 4. Processes in Python
A process is a full Python interpreter with its own memory.
| Aspect | Threads | Processes |
|---|---|---|
| Memory | Shared | Separate (isolated) |
| GIL | Shared (limits CPU work) | Each has its own |
| Startup time | Fast (microseconds) | Slow (milliseconds) |
| Data sharing | Easy (same memory) | Requires serialization |
✔ Advantages:
- True parallelism (uses multiple CPU cores)
- Great for CPU-bound work
- No GIL problems — each process has its own
❌ Disadvantages:
- More memory used (each process needs its own)
- Slower to start (spawning a new Python interpreter)
- Harder to share data (must serialize/deserialize)
Basic multiprocessing:
Basic Multiprocessing
Run CPU-bound tasks in separate processes
import multiprocessing
import time
def cpu_task(name):
print(f"{name} starting")
# CPU-intensive work: calculate sum of squares
# This keeps the CPU busy (not waiting for I/O)
total = sum(i * i for i in range(10_000_000))
print(f"{name} done: {total}")
# IMPORTANT: This guard is required on Windows!
# Without it, the script would spawn infinite processes
if __name__ == "__main__":
# Create 2 process objects
processes = [
multiprocessing.Process(tar
...🔄 5. Side-by-Side Comparison
| Feature | Threads | Processes |
|---|---|---|
| Speed for I/O | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Speed for CPU | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Memory usage | Low | High |
| Startup time | Fast | Slow |
| Shares memory? | Yes | No |
| Avoids GIL? | ❌ | ✔ |
| Best for? | I/O tasks | CPU tasks |
🧪 6. Real-World Examples
✔ Threads Example: Web Scraping
Threaded Web Scraping
Use threads to fetch multiple URLs in parallel
import threading
import requests
import time
def fetch_url(url):
response = requests.get(url)
print(f"Fetched {url}: {len(response.content)} bytes")
urls = [
"https://api.github.com/users/github",
"https://api.github.com/users/microsoft",
"https://api.github.com/users/google",
"https://api.github.com/users/facebook"
]
start = time.time()
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.j
...Threads shine because requests are I/O-bound.
✔ Processes Example: Image Processing
Process-Based Image Processing
Use processes for CPU-heavy image tasks
import multiprocessing
from PIL import Image
import time
def process_image(filename):
# CPU-heavy: resize, filter, transform
img = Image.open(filename)
img = img.resize((800, 600))
img = img.convert('L') # grayscale
img.save(f"processed_{filename}")
print(f"Processed {filename}")
if __name__ == "__main__":
images = ["img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
start = time.time()
processes = [
multiprocessing.Process(target=process_image, ar
...CPU work → multiprocessing is ideal.
🧱 7. Mixing Threads & Processes (Hybrid Model)
Many real systems use both:
- Processes for CPU-heavy pipelines
- Threads for network/database operations
- AsyncIO for massive lightweight tasks
| Component | Role in Hybrid System | Why? |
|---|---|---|
| AsyncIO | Orchestra conductor | Lightweight coordination of thousands of tasks |
| Threads | I/O specialists | Handle blocking I/O without stopping AsyncIO |
| Processes | Heavy lifters | CPU work on multiple cores simultaneously |
Example: A crawler using:
- Threads → fetch 1,000 URLs
- Processes → process each page's text
- AsyncIO → coordinate flows
This is how production-grade scrapers, ML preprocessors, and automation bots work.
🔥 8. When Should You Use What?
Use Threads when:
- ✔ I/O-bound
- ✔ Web scraping
- ✔ Automation
- ✔ Network servers
- ✔ Waiting on APIs
Use Processes when:
- ✔ CPU-bound
- ✔ ML preprocessing
- ✔ Math-heavy operations
- ✔ Hashing, encryption
- ✔ Image/video processing
Use AsyncIO when:
- ✔ Massive concurrency
- ✔ Lightweight operations
- ✔ Network-first tasks
- ✔ High scalability needed
🧠 9. How Python Schedules Threads Internally (Advanced)
Python uses cooperative + preemptive scheduling for threads.
Here's what really happens:
✔ The OS schedules threads
The operating system decides when each thread runs, based on CPU availability.
✔ The GIL schedules Python bytecode
Inside Python, only ONE thread can execute Python bytecode at once.
This creates:
- artificial bottlenecks for CPU tasks
- no bottlenecks for I/O (because threads release the GIL while waiting)
How the GIL behaves:
- A thread starts running Python code
- When it hits I/O (network, file), it releases the GIL
- Another thread can run
- When I/O completes, the thread resumes and reacquires GIL
This explains why:
- ✔ 100 threads downloading files works great
- ❌ 100 threads crunching numbers does NOT
⚡ 10. The True Strength of Threads: I/O Parallelism
Let's say you need to download 10,000 images.
Sequential time = 10,000 × (0.4 seconds each) = ~4000 seconds (1.1 hours)
Threaded time (200 threads) = ~20 seconds total
Why?
Because threads wait most of the time, so Python overlaps waits.
Common I/O sources:
- HTTP requests
- reading/writing files
- database queries
- cloud APIs
- waiting for user input
- network sockets
- sensor data
Threads shine because I/O releases the GIL.
🔥 11. The True Strength of Processes: CPU Parallelism
CPU-bound tasks include:
- encryption
- compression
- ML preprocessing
- hashing
- physics simulation
- number crunching
- image/video filtering
- audio processing
If you run these in threads → NO speed improvement.
If you run these in processes → 4× faster on 4 cores, 12× on 12 cores, etc.
Processes use true multi-core hardware.
🧬 12-18. Advanced Concurrency Topics
The next sections cover professional-level concurrency patterns:
12. Data Sharing Between Threads
Lock, RLock, Event, Semaphore, Queue — Safe shared memory primitives
13. Data Sharing Between Processes
multiprocessing.Queue, Pipe, Manager, shared memory arrays
14. ThreadPoolExecutor vs ProcessPoolExecutor
concurrent.futures abstraction for both models
15. Hybrid Concurrency Pattern
Combining AsyncIO + Threads + Processes like YouTube/Instagram
16. Avoiding Common Bugs
Race conditions, deadlocks, blocking calls, serialization issues
17. Real Benchmark
CPU-heavy: 8× with processes; I/O-heavy: 50× with threads
18. Decision Framework
✔ Threads for I/O | ✔ Processes for CPU | ✔ AsyncIO for massive concurrency
🧨 19. Race Conditions — The Silent Killer
A race condition happens when:
- Two or more threads access shared data
- AND at least one thread modifies it
- AND execution order determines the final result
| Symptom | What's Happening | Example |
|---|---|---|
| Inconsistent results | Different output each run | Counter shows 987,432 instead of 1,000,000 |
| Lost updates | Changes disappear | Two users edit same record, one is lost |
| Works sometimes | Timing-dependent bugs | Passes tests locally, fails in production |
Example — unsafe counter:
Race Condition Demo
See how unsynchronized threads cause incorrect results
import threading
counter = 0
def increment():
global counter
for _ in range(100_000):
counter += 1
threads = [threading.Thread(target=increment) for _ in range(10)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter) # NEVER equals 1,000,000Why?
counter += 1 is NOT atomic. It's 3 instructions:
- load counter
- add 1
- store result
Two threads collide → corrupted values.
🧱 20. Fixing Race Conditions With Locks
with lock:statement is like entering the bathroom and auto-locking the door behind you.Locks ensure only ONE thread accesses critical code at once.
| Lock Method | What It Does | When to Use |
|---|---|---|
lock.acquire() | Grab the lock (waits if taken) | Manual control needed |
lock.release() | Release the lock | After acquire() |
with lock: | Auto acquire + release | ✅ Always prefer this! |
Thread-Safe Counter with Lock
Use locks to prevent race conditions
import threading
lock = threading.Lock()
counter = 0
def increment():
global counter
for _ in range(100_000):
with lock:
counter += 1
threads = [threading.Thread(target=increment) for _ in range(10)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter) # Now correctly equals 1,000,000Now:
- ✔ Safe
- ✔ Deterministic
- ✔ Correct final value
But… ❗ Locks introduce blocking, which could slow threads.
🔒 21-32. Expert-Level Concurrency Patterns
Advanced topics covered:
21. RLock — Reentrant Locks
Same thread can acquire lock multiple times
22. Deadlocks
When threads freeze forever waiting on each other
23. Semaphores
Controlling access to limited resources
24. Events
Fast thread signalling
25. Conditions
Thread coordination
26. Shared Memory Between Processes
Value, Array, Manager, shared_memory
27. Serialization Issues
What breaks when passing objects to processes
28. ProcessPool Optimization
chunksize, initializer, preloading data
29. Real Architecture: High-Performance Scraper
AsyncIO → ThreadPool → ProcessPool pipeline
30. Real Architecture: ML Preprocessing
ProcessPool for CPU, ThreadPool for I/O, AsyncIO for APIs
31. Framework Internals
How FastAPI, Scrapy, PyTorch use concurrency
32. The Future: No-GIL Python
Python 3.13+ will enable true parallel threads for CPU work
🎓 Final Summary
You now understand advanced concurrency:
✔ Threads vs Processes — when to use each
✔ The GIL and its impact
✔ Race conditions, Locks, RLocks
✔ Deadlocks & prevention strategies
✔ Semaphores & resource limits
✔ Events & Conditions for coordination
✔ Shared memory across processes
✔ ThreadPoolExecutor & ProcessPoolExecutor
✔ Real engineering architectures
✔ Thread/AsyncIO/Process hybrid design
✔ Framework-level concurrency models
✔ Production optimization techniques
You're operating at professional backend engineer level now.
📋 Quick Reference — Concurrency
| Tool | Best for |
|---|---|
| threading.Thread | I/O-bound tasks, network calls |
| multiprocessing.Process | CPU-bound tasks (bypasses GIL) |
| threading.Lock() | Protect shared state between threads |
| queue.Queue() | Thread-safe data passing |
| multiprocessing.Queue() | Process-safe data passing |
🎉 Great work! You've completed this lesson.
You now understand the GIL, when to use threads vs processes, and how to safely share data between concurrent workers.
Up next: Parallelism — use concurrent.futures for a clean high-level API over threads and processes.
Sign up for free to track which lessons you've completed and get learning reminders.