Advanced Lesson
Advanced Collections, Itertools & Functools
Master Python's most powerful standard library modules for building high-performance, memory-efficient code.
๐งฐ What You'll Learn
This lesson covers incredibly powerful modules that advanced developers rely on every day:
- โ High-performance data structures
- โ Infinite & combinatorial iterators
- โ Stream pipelines
- โ Functional programming utilities
- โ Caching, partials, decorators
- โ Real engineering patterns
Master the tools used in:
- Data pipelines & ETL systems
- Machine learning preprocessing
- Log analysis & monitoring
- Search engines & indexing
- Real-time stream processing
๐ฅ Python Download & Setup
Download Python from: python.org/downloads
Latest version recommended (3.11+)
Part 1: Core Modules โ collections, itertools, functools
1. collections โ High-Performance Container Tools
The collections module provides optimized data structures that outperform normal lists/dicts in many use cases.
1.1 Counter โ Counting Made Easy
Best for: word frequency, counting events, histograms, text processing
Counter Example
from collections import Counter
c = Counter("mississippi")
print(c)
# Common operations
print(c.most_common(3))
c.update("moreletters")
print(c)
c.subtract("abc")
print(c)Use cases:
- โ NLP early preprocessing
- โ Log analysis
- โ Counting API requests
- โ Ranking items
1.2 defaultdict โ Automatic Missing Values
Instead of manually checking if keys exist:
defaultdict Example
from collections import defaultdict
# Instead of:
# d = {}
# if key not in d:
# d[key] = []
# d[key].append(value)
# Use defaultdict:
groups = defaultdict(list)
groups["a"].append(1)
groups["a"].append(2)
groups["b"].append(3)
print(dict(groups))Perfect for:
- โ Grouping rows
- โ Adjacency lists
- โ Bucket sorting
- โ Aggregation pipelines
1.3 deque โ Fast Queue / Stack / Sliding Window
deque = "double-ended queue"
deque Example
from collections import deque
dq = deque(maxlen=5)
for i in range(10):
dq.append(i)
print(f"After adding {i}: {list(dq)}")
print(f"\nFinal deque: {list(dq)}")Key advantages:
- โ O(1) append/pop on both ends
- โ Fast sliding windows
- โ Great for BFS graphs
- โ Ideal for real-time streaming
1.4 namedtuple โ Lightweight, Readable Structs
namedtuple Example
from collections import namedtuple
Point = namedtuple("Point", "x y")
p = Point(3, 4)
print(f"Point: {p}")
print(f"X coordinate: {p.x}")
print(f"Y coordinate: {p.y}")Use this when you need:
- โ Immutability
- โ Tuple speed
- โ Field names
- โ Memory efficiency
1.5 OrderedDict (for deterministic ordering)
Mostly replaced by Python 3.7+ dict, but still useful for:
- โ LRU caches
- โ Custom ordering
- โ Stable serialization
1.6 ChainMap โ Layered Configuration
Combine multiple mappings:
ChainMap Example
from collections import ChainMap
user_config = {"theme": "dark", "language": "en"}
default_config = {"theme": "light", "language": "en", "timeout": 30}
settings = ChainMap(user_config, default_config)
print(f"Theme: {settings['theme']}") # From user_config
print(f"Timeout: {settings['timeout']}") # From default_config2. itertools โ High-Performance Iterator Recipes
itertools is one of Python's strongest modules. It eliminates heavy loops and enables efficient pipelines.
2.1 Infinite Iterators
Infinite Iterators
from itertools import count, cycle, repeat
# count: generating sequences
print("count example:")
for n in count(5):
if n > 10:
break
print(n, end=" ")
print("\n\ncycle example (first 6):")
colors = cycle(['red', 'green', 'blue'])
for i, color in enumerate(colors):
if i >= 6:
break
print(color, end=" ")
print("\n\nrepeat example:")
print(list(repeat("A", 3)))2.2 Combinatorics (product, permutations, combinations)
Combinatorics
from itertools import product, permutations, combinations
# Cartesian product
print("product([1,2], ['x','y']):")
for a, b in product([1,2], ["x","y"]):
print(f" ({a}, {b})")
# Permutations
print("\npermutations([1,2,3], 2):")
print(list(permutations([1,2,3], 2)))
# Combinations
print("\ncombinations([1,2,3], 2):")
print(list(combinations([1,2,3], 2)))2.3 accumulate โ Cumulative Operations
accumulate Example
from itertools import accumulate
import operator
# Running sum
print("Running sum:", list(accumulate([1,2,3,4])))
# Running product
print("Running product:", list(accumulate([1,2,3,4], operator.mul)))
# Running max
print("Running max:", list(accumulate([3,1,4,1,5,9], max)))2.4 groupby โ Group Consecutive Items
groupby Example
from itertools import groupby
print("Grouping consecutive characters:")
for key, group in groupby("aabbccdd"):
print(f" {key}: {list(group)}")2.5 islice โ Slice Iterators Without Lists
islice Example
from itertools import islice, count
# Take first 10 items from infinite iterator
print("First 10 from count(0):")
print(list(islice(count(0), 10)))2.6 chain โ Combine Multiple Iterables
chain Example
from itertools import chain
result = list(chain([1,2], [3,4], [5]))
print("chain([1,2], [3,4], [5]):", result)2.7 tee โ Duplicate Iterators
tee Example
from itertools import tee
original = iter([1, 2, 3, 4, 5])
a, b = tee(original)
print("First copy:", list(a))
print("Second copy:", list(b))3. functools โ Functional Power Tools
3.1 lru_cache โ Instant Caching
lru_cache Example
from functools import lru_cache
import time
@lru_cache(maxsize=100)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
start = time.time()
result = fib(35)
elapsed = time.time() - start
print(f"fib(35) = {result}")
print(f"Computed in {elapsed:.4f} seconds (with cache)")3.2 partial โ Pre-Fill Function Arguments
partial Example
from functools import partial
def multiply(x, y):
return x * y
times_10 = partial(multiply, 10)
print("times_10(5):", times_10(5))
print("times_10(7):", times_10(7))3.3 reduce โ Functional Reduce Operations
reduce Example
from functools import reduce
result = reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
print("Sum with reduce:", result)
product = reduce(lambda x, y: x * y, [1, 2, 3, 4, 5])
print("Product with reduce:", product)3.4 singledispatch โ Generic Functions
singledispatch Example
from functools import singledispatch
@singledispatch
def process(obj):
print("default:", obj)
@process.register(int)
def _(value):
print("int:", value * 2)
@process.register(list)
def _(value):
print("list:", len(value), "items")
process("hello")
process(42)
process([1, 2, 3])3.5 cached_property โ Lazy Loaded Attributes
cached_property Example
from functools import cached_property
class DataLoader:
@cached_property
def data(self):
print("Loading data (expensive operation)...")
return [1, 2, 3, 4, 5]
loader = DataLoader()
print("First access:", loader.data)
print("Second access:", loader.data) # No "Loading" message4. Combining All Three Modules Into Powerful Pipelines
Example: large-scale log analysis.
Combined Pipeline
from collections import Counter
from itertools import islice
from functools import lru_cache
@lru_cache
def normalize(line):
return line.lower().strip()
# Simulated log lines
log_lines = [
"ERROR: Connection failed",
"INFO: User logged in",
"ERROR: Connection failed",
"WARNING: Disk space low",
"ERROR: Connection failed",
"INFO: User logged out",
]
c = Counter()
for line in islice(log_lines, None):
c[normalize(line)] += 1
print("Log analysis results:")
for msg
...Part 2: Advanced Patterns With Each Module
1. Advanced Counter Patterns
1.1 Subtracting / Combining Counters
Combining Counters
from collections import Counter
c1 = Counter("banana")
c2 = Counter("bandana")
print("c1:", c1)
print("c2:", c2)
print("c1 + c2:", c1 + c2)
print("c1 - c2:", c1 - c2)2. Advanced defaultdict Usage
2.1 Multi-Level defaultdict
Nested defaultdict
from collections import defaultdict
nested = defaultdict(lambda: defaultdict(int))
nested["user1"]["views"] += 1
nested["user1"]["likes"] += 3
nested["user2"]["views"] += 5
print("User metrics:")
for user, metrics in nested.items():
print(f" {user}: {dict(metrics)}")3. Advanced deque Patterns
3.1 Efficient Sliding Window Statistics
Sliding Window
from collections import deque
window = deque(maxlen=5)
def add_and_average(value):
window.append(value)
return sum(window) / len(window)
values = [10, 20, 30, 40, 50, 60, 70]
for v in values:
avg = add_and_average(v)
print(f"Added {v}, window: {list(window)}, avg: {avg:.1f}")4. Advanced namedtuple Usage
Advanced namedtuple
from collections import namedtuple
# With default values
Point = namedtuple("Point", "x y", defaults=[0, 0])
p1 = Point()
p2 = Point(3, 4)
print(f"Default point: {p1}")
print(f"Custom point: {p2}")
# Replacing fields
p3 = p2._replace(x=10)
print(f"After replace: {p3}")
# From dictionary
data = {"x": 5, "y": 8}
p4 = Point(**data)
print(f"From dict: {p4}")5. Advanced itertools Patterns
5.1 Batch Iteration (Chunking)
Chunking with islice
from itertools import islice
def chunks(iterable, size):
iterator = iter(iterable)
while chunk := list(islice(iterator, size)):
yield chunk
data = range(1, 11)
print("Chunking 1-10 into groups of 3:")
for batch in chunks(data, 3):
print(f" Batch: {batch}")5.2 Combining Streams With zip_longest
zip_longest Example
from itertools import zip_longest
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c', 'd', 'e']
merged = list(zip_longest(list1, list2, fillvalue=None))
print("Merged with zip_longest:")
for pair in merged:
print(f" {pair}")6. Advanced functools Patterns
6.1 Combining partial + map
partial with map
from functools import partial
def multiply(x, y):
return x * y
times_10 = partial(multiply, 10)
result = list(map(times_10, range(1, 6)))
print("Multiplied by 10:", result)Part 3: High-Performance Data Pipelines
8. High-Performance Data Pipelines (Full Walkthrough)
Modern Python systems โ data processing apps, ML pipelines, scrapers, log processors โ rely on streaming, chunking, and lazy evaluation.
Complete Pipeline Example
Full Data Pipeline
from itertools import islice
from collections import Counter
from functools import lru_cache
# Stage 1: Streaming Reader (Generator)
def read_lines(lines):
for line in lines:
yield line
# Stage 2: Cleaning & Normalizing
def normalize(lines):
for line in lines:
line = line.strip().lower()
if line:
yield line
# Stage 3: Tokenizing
def tokenize(lines):
for line in lines:
for word in line.split():
yield word
# Stage 4: Batching
...๐ Final Summary
In this comprehensive lesson, you learned expert-level usage of three critical Python modules:
โ collections
- Counter
- defaultdict
- deque
- namedtuple
- OrderedDict
โ itertools
- Chunking
- Windowing
- Grouping
- Infinite streams
- Combinatorics
โ functools
- Caching
- Partial functions
- Dispatch
- Function composition
- Decorators
๐ Quick Reference โ Advanced Collections
| Tool | Best for |
|---|---|
| Counter(iterable) | Count occurrences of elements |
| deque(maxlen=100) | Fast append/pop from both ends |
| defaultdict(list) | Dict with automatic default values |
| itertools.chain(*iterables) | Combine multiple iterables |
| functools.lru_cache | Cache function results by arguments |
๐ Great work! You've completed this lesson.
You can now use Counter, deque, defaultdict, itertools and functools to write more expressive and efficient Python.
Up next: Functional Programming โ master map, filter, reduce, and pure function patterns.
Sign up for free to track which lessons you've completed and get learning reminders.