Lesson 24 • Advanced
Profiling & Optimising Python Performance
High-performance Python isn't about writing "faster code" — it's about finding bottlenecks and eliminating them with scientific precision. You cannot optimise what you do not measure.
What You'll Learn in This Lesson
- • How to profile CPU usage with
cProfileandline_profiler - • How to measure memory usage with
tracemallocandmemory_profiler - • How to find the real bottleneck in your code (it's rarely where you think)
- • Practical optimisation techniques: caching, algorithm choice, data structures
- • How to write benchmarks using
timeitand interpret results correctly - • Production-level performance patterns used in real Python systems
🔥 1. Why Profiling Matters
Beginners try to "guess" what's slow.
Advanced developers measure what's slow.
| Approach | Method | Result |
|---|---|---|
| ❌ Guessing | "This loop looks slow" | Waste time optimizing the wrong code |
| ✔ Profiling | Measure actual execution time | Find and fix real bottlenecks |
The 80/20 Rule:
20% of code → 80% of runtime
Optimizing the wrong 80% gives no improvement!
⚙️ 2. Timing Functions with time.perf_counter()
For quick micro-benchmarks:
Timing with perf_counter
Quick micro-benchmarks
import time
start = time.perf_counter()
# code block
end = time.perf_counter()
print("Elapsed:", end - start)Use this for comparing:
- two ways of looping
- two algorithms
- two function implementations
But for full programs, we need real profilers.
🧠 3. Profiling With cProfile — The Standard Tool
Run a script with profiling from command line:
cProfile Command
Run a script with profiling
python -m cProfile myscript.py⚠️ Requires local Python installation • Download Python
Or profile a specific function in your code:
Profile a Function
Profile a specific function
import cProfile
def slow():
# A loop that runs 5 million times
for _ in range(5_000_000):
pass
# Profile this specific function call
cProfile.run("slow()")
# Output shows: number of calls, total time, time per call| Column | What It Shows |
|---|---|
| ncalls | Number of times function was called |
| tottime | Total time in this function (excluding subcalls) |
| cumtime | Cumulative time (including subcalls) |
📊 4. Making Results Readable With pstats
Readable Profiling Results
Use pstats to sort and filter
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
# code…
for _ in range(3_000_000):
pass
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats("tottime").print_stats(10)| Sort By | What It Shows | Best For Finding |
|---|---|---|
"tottime" | Time in function itself | The actual slow functions |
"cumtime" | Time including all sub-calls | Functions that call slow things |
"ncalls" | Number of times called | Unexpectedly hot loops |
Sorting options:
- "tottime" → slowest total functions
- "cumtime" → functions including subcalls
- "ncalls" → most-called functions
sort_stats("tottime") to find where time is actually spent. Then use "cumtime" to trace back to which high-level functions are triggering the slow code.🧠 5. Line-by-Line Profiling With line_profiler
Install:
Install line_profiler
Install the profiler
pip install line_profiler⚠️ Requires local Python installation • Download Python
Use:
Line Profiler Decorator
Mark functions for line-by-line profiling
@profile
def slow():
total = 0
for i in range(10_000_000):
total += iRun:
Run Line Profiler
Run and view results
kernprof -l myscript.py
python myscript.py.lprof⚠️ Command-line tool - requires local setup
Shows exactly which line is slow.
This is invaluable for:
- nested loops
- ML preprocessing
- tight functions
- recursive code
🧩 6. Memory Profiling
Install:
Install memory_profiler
Install the memory profiler
pip install memory_profiler⚠️ Requires local Python installation • Download Python
Usage:
Memory Profiler
Track memory usage line-by-line
from memory_profiler import profile
@profile
def load_items():
items = [i for i in range(5_000_000)]
return items| Memory Issue | Symptom | Common Cause |
|---|---|---|
| Memory spike | Sudden +500MB on one line | Loading large dataset at once |
| Memory leak | Memory grows over time | Data accumulating in loops |
| High baseline | Program starts with 200MB+ | Heavy imports (pandas, tensorflow) |
Shows memory growth line-by-line.
Useful for:
- ✔ large lists
- ✔ pandas
- ✔ numpy allocations
- ✔ memory leaks
- ✔ generators vs lists performance
delwhen done, 4) Use gc.collect() to force cleanup.⚡ 7. Real Techniques for Faster Python
1. Use built-ins over manual loops
Built-ins vs Manual Loops
Built-ins use C-level optimisations
sum(list) # faster
vs
total = 0
for x in list: total += xBuilt-ins use C-level optimisations.
2. Prefer list comprehensions
List Comprehensions
Faster than explicit loops
[x*x for x in nums] # faster
vs
result = []
for x in nums:
result.append(x*x)3. Use generators for large data
Generators
Saves huge amounts of memory
(x*x for x in nums)saves huge amounts of memory.
4. Use numpy for heavy math
Pure Python loops are slow.
NumPy performs operations in C — often 50–200× faster.
5. Cache Results With functools.lru_cache
lru_cache
Cache results for instant performance
from functools import lru_cache
@lru_cache(None)
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)Transforms slow recursive functions → instant.
6. Use Multiprocessing for CPU
Multiprocessing
Run tasks on multiple cores
from multiprocessing import Pool
with Pool() as pool:
pool.map(func, items)⚠️ Works best with local Python installation
Runs tasks on multiple cores.
🏎️ 8. Avoiding the Biggest Performance Mistakes
| ❌ Mistake | Why It's Slow | ✅ Better Approach |
|---|---|---|
| Unnecessary list copies | Copies entire list in memory | Use slices or itertools |
| Python loops for math | Interpreted = slow | NumPy vectorized operations |
| String concatenation in loop | Creates new string each time | Use ''.join(list) |
| Opening files repeatedly | Disk I/O is expensive | Open once, read/write many |
| Blocking I/O in async | Blocks the entire event loop | Use run_in_executor() |
Summary of Common Traps:
- ❌ Unnecessary list copies
- ❌ Using Python loops for math
- ❌ Excessive string concatenation
- ❌ Opening files repeatedly
- ❌ Overuse of classes when simple functions work
- ❌ Blocking I/O in async code
🧪 9. Real-World Example: Speeding Up JSON Parsing
Slow version:
Slow JSON Parsing
Standard json module
import json
data = [json.loads(x) for x in lines]Optimised:
Fast JSON Parsing
orjson is 5-20x faster
import orjson
data = [orjson.loads(x) for x in lines]⚠️ Requires: pip install orjson • Download Python
orjson is 5–20× faster than Python's JSON parser.
🎉 Conclusion
By mastering profiling and optimisation, you gain the ability to:
✔ Identify real bottlenecks
✔ Build faster APIs and scripts
✔ Optimise ML preprocessing
✔ Save CPU & memory in production
✔ Think like a performance engineer
Performance comes from measure → diagnose → optimise, not guessing.
📋 Quick Reference — Profiling & Performance
| Tool / Syntax | What it does |
|---|---|
| cProfile.run('fn()') | Profile function call counts and time |
| timeit.timeit('expr', number=1000) | Benchmark small code snippets |
| line_profiler | Profile line-by-line execution time |
| memory_profiler | Track memory usage per line |
| __slots__ | Reduce class memory footprint |
🎉 Great work! You've completed this lesson.
You now know how to measure, diagnose, and fix performance bottlenecks — the professional workflow every senior engineer uses.
Up next: Memory Management — understand how Python's garbage collector works and prevent memory leaks.
Sign up for free to track which lessons you've completed and get learning reminders.