Advanced Track
Partitioning: Range, List, Hash & Composite
By the end of this lesson you'll be able to split a billion-row table into manageable pieces, choose the right partitioning strategy for your data, and write queries the optimiser can prune down to a single partition — turning full-table scans into instant reads and slow archival into a one-line drop.
What You'll Learn
- ✓Split a table with PARTITION BY RANGE on a date column
- ✓Use LIST partitioning for categories and regions
- ✓Spread load evenly with HASH partitioning
- ✓Combine strategies with COMPOSITE (sub-) partitioning
- ✓Make queries trigger partition pruning (and know when they won't)
- ✓Archive data instantly by dropping or detaching a partition
What Is Partitioning?
Partitioning splits one logically-single table into many smaller physical tables, called partitions, based on a partition key (one or more columns). To your application it's still one table — you SELECT and INSERT against the parent — but under the hood the database stores and scans each partition separately.
🗄️ Real-world analogy
Think of a filing cabinet with one drawer per month. You don't dump every invoice in a single giant pile — you put each in its month's drawer. When someone asks for March invoices, you open only the March drawer and ignore the other eleven. That "open only the relevant drawer" move is exactly partition pruning, the optimisation that makes partitioning fast.
1. RANGE Partitioning — Split by Value Ranges
RANGE partitioning sends each row to a partition based on which range its key falls into. It's the most common strategy and the natural fit for anything ordered — dates above all, but also sequential IDs or numeric buckets. You define each partition with a lower and upper bound; the bounds are half-open [FROM, TO), so the FROM value is included and the TO value is the first value that belongs to the next partition.
RANGE partitioning by date
Split an orders table into quarterly partitions.
-- RANGE partitioning: route each row by where its key falls in a range.
-- Best for time-series data you query and archive by date (orders, logs, events).
-- 1) Declare the PARENT table and the partition KEY. The parent holds no
-- rows itself — it is a router. order_date is the "partition key".
CREATE TABLE orders (
id BIGSERIAL,
customer_id INT NOT NULL,
order_date DATE NOT NULL, -- the column we split on
total DECIMAL(12,2),
status VARCHAR
...2. Partition Pruning — Why It's Fast
Partition pruning is the optimiser reading your WHERE clause, comparing it to each partition's bounds, and refusing to even open the partitions that can't contain a match. A query against five years of data that filters to one quarter touches one partition instead of twenty. Use EXPLAIN to prove it — the plan lists only the partitions actually scanned.
WHERE (or JOIN) condition. Filter on anything else and every partition gets scanned — partitioning then makes you slower, not faster.See pruning with EXPLAIN
Compare a query that prunes against one that can't.
-- Partition PRUNING: the optimiser reads the WHERE clause, sees the
-- partition key, and skips every partition that cannot contain a match.
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-04-15' AND '2024-06-30';
-- Plan reads (simplified):
-- Append
-- -> Seq Scan on orders_2024_q2 -- ONLY Q2 is touched
-- Q1, Q3 and Q4 are pruned: never opened, never read.
-- The catch — pruning needs the KEY in the WHERE clause. This does NOT prune:
EXPLAIN SELECT * FROM orders WHER
...Your Turn: choose the strategy
A table is queried by a small, fixed set of warehouse codes. Pick RANGE, LIST, or HASH and fill in the blank. The reasoning and answer are in the comments so you can check yourself.
🎯 Your Turn: RANGE vs LIST vs HASH
Replace the ___ with the right strategy keyword.
-- 🎯 YOUR TURN — choose the right strategy, then fill the blank.
-- Table: a "shipments" table queried almost entirely by warehouse code
-- ('LON', 'NYC', 'TKO'), a small fixed set of values.
-- Question: which strategy gives you single-partition pruning here?
CREATE TABLE shipments (
id BIGSERIAL,
warehouse VARCHAR(3) NOT NULL,
shipped_at TIMESTAMP
) PARTITION BY ___ (warehouse); -- 👉 RANGE, LIST or HASH? Pick one keyword.
-- ✅ Expected: LIST — the key
...3. LIST Partitioning — Split by Category
LIST partitioning assigns rows by matching the key against an explicit set of values, like sorting mail into bins labelled by country. Reach for it when the key is categorical and you almost always filter by it: region, country, tenant ID, status. Add a DEFAULT partition as a catch-all — without one, inserting a value you didn't list raises an error.
💡 Pro tip — always add a DEFAULT partition
A DEFAULT partition catches every value that doesn't match a listed partition. Without it, an unexpected region like 'za' makes the INSERT fail with "no partition of relation found for row".
LIST partitioning by region
Split customers into per-continent partitions.
-- LIST partitioning: route each row by an EXACT value (or set of values).
-- Best for categorical columns you always filter by: region, status, tenant.
CREATE TABLE customers (
id SERIAL,
name VARCHAR(100),
email VARCHAR(200),
region VARCHAR(20) NOT NULL, -- the partition key
created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (region);
CREATE TABLE customers_americas PARTITION OF customers
FOR VALUES IN ('us', 'ca', 'mx', 'br');
CREATE TABL
...4. HASH Partitioning — Even Distribution
HASH partitioning has no natural ranges or categories — it runs the key through a hash function and uses hash(key) % N to pick a partition, spreading rows almost perfectly evenly. Use it to cap partition size when there's no meaningful range or list, for example splitting sessions by user_id. The trade-off: hashing destroys order, so it can only prune for exact equality, never for ranges.
⚠️ Common mistake
Choosing hash when you actually run range queries. WHERE user_id = 42 prunes to one partition, but WHERE user_id > 100 must scan all of them — the hash scatters consecutive values everywhere. If you query by range, use RANGE.
HASH partitioning by user_id
Spread sessions evenly across four partitions.
-- HASH partitioning: route each row by hash(key) % N for an EVEN spread.
-- Best when there is no natural range/list but you want to cap partition size.
CREATE TABLE sessions (
id UUID DEFAULT gen_random_uuid(),
user_id INT NOT NULL, -- the partition key
data JSONB,
created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY HASH (user_id);
-- Define N partitions with MODULUS = N and a unique REMAINDER each.
CREATE TABLE sessions_p0 PARTITION OF sessions
F
...Your Turn: complete the date range
Finish a RANGE partition that holds the first half of 2025. Mind the half-open bounds — the TO value is excluded.
🎯 Your Turn: FOR VALUES FROM ... TO ...
Fill in the two date-string blanks.
-- 🎯 YOUR TURN — complete a RANGE partition definition by date.
-- Goal: a partition that holds the FIRST half of 2025 (Jan 1 – Jun 30).
-- Remember: FROM is inclusive, TO is EXCLUSIVE.
CREATE TABLE logs (
id BIGSERIAL,
log_date DATE NOT NULL,
message TEXT
) PARTITION BY RANGE (log_date);
CREATE TABLE logs_2025_h1 PARTITION OF logs
FOR VALUES FROM (___) TO (___); -- 👉 two 'YYYY-MM-DD' date strings
-- ✅ Expected: FROM ('2025-01-01') TO ('2025-07-01')
-- The TO bou
...5. COMPOSITE Partitioning — Split by Two Dimensions
Composite (or sub-) partitioning makes a partition that is itself partitioned — split by RANGE on date, then by LIST on type within each year. Queries that filter on both dimensions get double pruning, narrowing straight to one sub-partition. It's a power tool for billion-row tables only: every layer multiplies your partition count, and too many partitions hurts.
COMPOSITE (sub-) partitioning
Range by date, then list by type inside each year.
-- COMPOSITE (sub-) partitioning: split by RANGE, then by LIST inside each
-- range. Use only for billions of rows where one split still leaves them huge.
CREATE TABLE events (
id BIGSERIAL,
event_type VARCHAR(50) NOT NULL,
event_date DATE NOT NULL,
payload JSONB
) PARTITION BY RANGE (event_date); -- first dimension: date
-- A year-level partition that is ITSELF partitioned by type.
CREATE TABLE events_2024 PARTITION OF events
FOR VALUES FROM ('2
...6. Maintenance — Archive in One Line
The biggest day-to-day payoff of partitioning is cheap maintenance. Deleting a year of rows from a normal table is a slow, lock-heavy DELETE that leaves bloat behind; with partitions you just DROP or DETACH the whole partition — a near-instant metadata change. You typically add next month's partition on a schedule and drop the oldest at the same time. On PostgreSQL 11+, indexing the parent automatically creates a matching index on every partition.
Add, drop, detach, and index partitions
Lifecycle operations that keep partitioned tables tidy.
-- Partition MAINTENANCE: cheap operations that make partitioning worth it.
-- Add next quarter ahead of time (do this on a schedule):
CREATE TABLE orders_2025_q1 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
-- Archive old data INSTANTLY. DROP/DETACH of a partition is a metadata
-- change — no row-by-row DELETE, no bloat, no long-held locks:
DROP TABLE orders_2024_q1; -- delete the quarter
-- ...or keep it but remove it from the live table:
...When NOT to Partition
- Small tables. Under a few million rows, a good index beats partitioning. Partitioning adds planning overhead and complexity for no real gain.
- Queries that don't use the key. If most queries can't filter on the partition key, nothing prunes and you've just made every query scan more objects.
- Too many partitions. Thousands of tiny partitions slow down query planning and balloon catalog/metadata overhead. Aim for partitions large enough to matter (often months/quarters, not hours).
Common Errors (and the fix)
- No pruning — every partition scanned: your
WHEREdoesn't mention the partition key.WHERE total > 1000can't prune a table partitioned byorder_date. Add anorder_datefilter, or index the other column on each partition. - "no partition of relation found for row": you inserted a key value outside every partition's bounds (e.g. a 2025 date when only 2024 partitions exist). Create the missing partition, or add a
DEFAULTpartition. - "every hash partition modulus must be a factor of the next": all hash partitions in a set must share the same
MODULUSwith distinct remainders. UseMODULUS 4, REMAINDER 0..3, not a mix of moduli. - "partition constraint is violated by some row" on ATTACH: the existing table contains rows outside the bounds you're attaching it under. Clean or re-bound the data first so every row fits.
- Wrong strategy chosen: hash for range queries, or list for high-cardinality keys (you'd need one partition per value). Re-pick using the Quick Reference below before you load data.
📘 Quick Reference
| Strategy | Best for | Prunes on |
|---|---|---|
| RANGE | Dates, sequential IDs, numeric buckets | Ranges and equality |
| LIST | Categories: region, status, tenant | Equality on listed values |
| HASH | Even spread, no natural key | Equality only |
| COMPOSITE | Billion-row tables, 2+ filter dimensions | Multi-level (both keys) |
Result — key syntax (PostgreSQL 11+):
| Task | Syntax |
|---|---|
| Declare parent | CREATE TABLE t (...) PARTITION BY RANGE (col); |
| Range child | ... PARTITION OF t FOR VALUES FROM (a) TO (b); |
| List child | ... PARTITION OF t FOR VALUES IN (v1, v2); |
| Hash child | ... PARTITION OF t FOR VALUES WITH (MODULUS n, REMAINDER r); |
| Catch-all | ... PARTITION OF t DEFAULT; |
| Archive | DROP TABLE t_old; / ALTER TABLE t DETACH PARTITION t_old; |
Frequently Asked Questions
Q: What's the difference between partitioning and sharding?
Partitioning splits a table across multiple tables on one server; sharding splits it across multiple servers. Partitioning helps a single machine cope with big tables; sharding scales beyond one machine. You'll cover sharding next.
Q: Can I change a non-partitioned table into a partitioned one?
Not in place — partitioning is set at CREATE TABLE. The usual path is: create a new partitioned table, copy rows in (or ATTACH the old table as a partition), then swap names. Plan it before the table gets huge.
Q: Do I still need indexes if I partition?
Almost always yes. Pruning narrows the query to the right partition; an index then finds rows quickly inside it. On PostgreSQL 11+, indexing the parent creates the index on every partition automatically.
Q: How many partitions is too many?
There's no hard limit, but query planning time and metadata overhead grow with partition count. Hundreds are usually fine; many thousands of tiny partitions often hurt more than they help. Size partitions so each one is worth skipping.
Mini-Challenge: Design a Metrics Table
Put it all together — a brief, an empty canvas, and the expected shape in the comments. Decide the strategy, write the table and one partition, then the one-line archive. Copy it into a Postgres playground to confirm.
🎯 Mini-Challenge
Pick a strategy, partition by day, drop a day instantly.
-- 🎯 MINI-CHALLENGE — design a partitioned table from scratch.
-- Brief: a "metrics" table stores billions of rows. Queries always look like
-- "give me CPU readings for one server on one day". You need fast reads
-- AND the ability to drop a whole day of old data in one statement.
--
-- 1. Pick a partition STRATEGY and KEY (which column do you split on?).
-- 2. Write CREATE TABLE metrics (...) PARTITION BY <strategy> (<key>);
-- 3. Create ONE partition for 2026-06-15.
-- 4. Show
...🎉 Lesson Complete
- ✅ RANGE splits ordered data (dates, IDs) with half-open
[FROM, TO)bounds - ✅ LIST splits exact categories; always add a
DEFAULTpartition - ✅ HASH spreads rows evenly but only prunes on equality
- ✅ COMPOSITE sub-partitions for double pruning on huge tables
- ✅ Pruning only works when the partition key is in the
WHEREclause - ✅
DROP/DETACHarchive a partition instantly — no slowDELETE - ✅ Next: Sharding & Distributed SQL — scaling a table across many servers
Sign up for free to track which lessons you've completed and get learning reminders.