Lesson 44 • Advanced Track
Database Design Patterns
By the end of this lesson you'll be able to design schemas that hold up in real, large systems — choosing transactional vs analytical layouts, the right key strategy, lookup and junction tables, hierarchies, soft deletes, audit trails, and knowing exactly when to break the rules of normalisation. This is the difference between a schema that survives and one you rebuild at 2am.
What You'll Learn
- ✓Pick OLTP vs OLAP schema shapes for the workload
- ✓Choose surrogate vs natural keys deliberately
- ✓Model many-to-many with junction tables and lookups
- ✓Store hierarchies: adjacency list vs closure table
- ✓Add soft deletes, timestamps, and audit/history tables
- ✓Avoid the EAV anti-pattern and denormalise on purpose
How to read this lesson
🏗️ Real-world analogy
A schema is the foundation and frame of a building. You can repaint walls (queries) any time, but moving a load-bearing wall (the table design) after people have moved in is enormously expensive. These patterns are the structural-engineering rules that stop the building falling down as it grows from a cottage into a tower.
There's no single sample table this time — each pattern brings its own small schema. Read the comments as the lesson; they state the why and the trade-off for every design choice.
1. OLTP vs OLAP — Two Opposite Goals
Before any table, ask one question: is this database serving an app or answering analytics? They pull in opposite directions.
OLTP (Online Transaction Processing) is your live app: thousands of tiny reads and writes a second, each touching one or two rows. You normalise — store each fact once — so an update is consistent and cheap.
OLAP (Online Analytical Processing) is the dashboard: a few enormous queries that scan history and aggregate. You denormalise into a star schema so analysts join less and scan faster.
⚡ OLTP — like a cash register
Web apps, banking, checkout. Fast single-row operations. Normalised, write-heavy.
📊 OLAP — like a year-end report
Warehouses, BI. Huge aggregations over millions of rows. Denormalised, read-heavy.
OLTP — normalised transactional schema
Small, fast, single-row reads and writes.
-- OLTP = Online Transaction Processing.
-- The shape behind a live app: small, fast, single-row reads and writes.
-- Normalised (no duplicated facts) so an UPDATE touches data in ONE place.
CREATE TABLE customers (
customer_id BIGSERIAL PRIMARY KEY, -- surrogate key (see section 2)
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
...OLAP — star schema for analytics
Fact + dimension tables built to aggregate.
-- OLAP = Online Analytical Processing.
-- The shape behind dashboards: scan MILLIONS of rows, aggregate, report.
-- A "star schema": one FACT table (the events) surrounded by DIMENSION
-- tables (the descriptions). Deliberately denormalised so analysts join less.
-- Fact: one row per measurable event, mostly numbers (the "measures").
CREATE TABLE fact_sales (
sale_id BIGSERIAL PRIMARY KEY,
date_key INT NOT NULL, -- foreign keys into the dimensions
product_key INT NO
...2. Surrogate vs Natural Keys
A primary key uniquely identifies a row. A natural key uses real-world data (an email, an ISBN). A surrogate key is a meaningless auto-number (BIGSERIAL, IDENTITY, or a UUID) that exists only to be the identifier.
The problem with natural keys: real-world data changes. If email is your key and every order references it, a customer changing their email forces you to rewrite every foreign key. Surrogate keys never change and never get reused, so foreign keys stay stable forever.
PRIMARY KEY, and add a UNIQUE constraint on the natural key so duplicates are still impossible.Surrogate vs natural keys
Why the auto-number wins for app tables.
-- A NATURAL key is real-world data used as the identifier
-- (email, ISBN, country code). A SURROGATE key is a meaningless
-- auto-generated number (BIGSERIAL / IDENTITY / UUID) with no meaning.
-- Natural key — looks tidy, but breaks when reality changes:
CREATE TABLE users_natural (
email VARCHAR(255) PRIMARY KEY, -- what happens when they change email?
name VARCHAR(100)
);
-- Surrogate key — the modern default for app tables:
CREATE TABLE users_surrogate (
user_id BIGSERIAL
...3. Lookup / Reference Tables
When a column can only hold a small fixed set of values — statuses, roles, countries — don't store free text. Put the allowed values in a tiny lookup table and reference it with a foreign key. Now a typo like 'Shiped' is rejected by the database, not silently saved.
A status lookup table
Controlled vocabulary instead of free text.
-- A LOOKUP (reference) table replaces a free-text column with a
-- small controlled list. Instead of typing 'shipped' / 'Shipped' /
-- 'SHIPED' all over the orders table, you point at one row.
CREATE TABLE order_statuses (
status_id SMALLINT PRIMARY KEY,
code VARCHAR(20) UNIQUE NOT NULL, -- 'pending', 'shipped', ...
description VARCHAR(100) NOT NULL
);
INSERT INTO order_statuses (status_id, code, description) VALUES
(1, 'pending', 'Awaiting payment'),
(2, '
...4. Many-to-Many: the Junction Table
A one-to-many relationship (one customer, many orders) fits with a foreign key on the "many" side. But many-to-many — students and courses, posts and tags — won't fit in either table. The fix is a third junction table (also called a bridge or linking table) that stores one row per pairing.
Give the junction a composite primary key of both foreign keys: that guarantees a student can't be enrolled in the same course twice.
Junction table schema
students ↔ enrolments ↔ courses.
-- A student can take MANY courses; a course has MANY students.
-- Many-to-many can't live in one table — you need a JUNCTION table
-- (also called a join / bridge / linking table) holding one row per pair.
CREATE TABLE students (
student_id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
course_id BIGSERIAL PRIMARY KEY,
title VARCHAR(150) NOT NULL
);
-- The junction: each row links ONE student to ONE course.
CREATE TABLE enrolments (
...To read across the relationship, you join through the junction in the middle — connect the left side to the link, then the link to the right side.
Worked query: students and their courses
Join through the junction table.
-- WORKED QUERY: list every student with the courses they take.
-- You walk the junction in the middle to connect the two sides.
SELECT s.name AS student,
c.title AS course
FROM students s
JOIN enrolments e ON s.student_id = e.student_id -- student -> link
JOIN courses c ON e.course_id = c.course_id -- link -> course
ORDER BY s.name, c.title;Result — one row per (student, course) pair:
| student | course |
|---|---|
| Ada Lovelace | Data Modelling |
| Ada Lovelace | SQL Basics |
| Grace Hopper | SQL Basics |
| … | … |
Your Turn: count students per course
Fill in the one blank to finish the join, then aggregate. The expected result is in the comments so you can check yourself.
🎯 Your Turn: junction join
Complete the JOIN onto the enrolments table.
-- 🎯 YOUR TURN — complete the many-to-many join.
-- Goal: count how many students are enrolled in EACH course.
SELECT c.title,
COUNT(*) AS student_count
FROM courses c
JOIN enrolments e ON c.course_id = ___ -- 👉 the matching column in enrolments
GROUP BY c.title
ORDER BY student_count DESC;
-- ✅ Expected: one row per course, e.g.
-- SQL Basics | 3 , Data Modelling | 2 , ...5. Self-Referencing Hierarchies
Categories with subcategories, employees with managers, threaded comments — these are trees, and a table can point at itself to model them. There are two classic patterns with opposite trade-offs.
The adjacency list stores each row's parent_id. Writes are trivial, but reading a whole subtree needs a recursive query.
The closure table stores every ancestor→descendant pair, so reading a subtree is a plain join — but every move means maintaining many path rows.
Adjacency list (parent_id)
Cheap writes, recursive reads.
-- HIERARCHY pattern A: Adjacency list.
-- Each row stores a pointer to its PARENT. Simple, cheap to write.
CREATE TABLE categories (
category_id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id BIGINT REFERENCES categories(category_id) -- self-reference
);
-- Electronics(parent NULL) -> Phones(parent Electronics) -> iPhone(parent Phones)
-- Walking the tree needs a RECURSIVE query:
WITH RECURSIVE tree AS (
SELECT category_id, name, parent_id, 1 AS depth
...Closure table (all paths)
Cheap reads, heavier writes.
-- HIERARCHY pattern B: Closure table.
-- Store EVERY ancestor-descendant pair (including self). Reads are a plain
-- JOIN with no recursion — great when you query subtrees constantly.
CREATE TABLE category_paths (
ancestor_id BIGINT NOT NULL REFERENCES categories(category_id),
descendant_id BIGINT NOT NULL REFERENCES categories(category_id),
depth INT NOT NULL, -- 0 = the node itself
PRIMARY KEY (ancestor_id, descendant_id)
);
-- "All descendants of Electro
...6. Soft Deletes, Timestamps & Audit Tables
Real systems rarely truly delete data. A soft delete flags a row as gone (is_deleted and/or deleted_at) instead of removing it — so you keep history, can undo mistakes, and never lose referenced rows. The catch: every normal query must add WHERE is_deleted = FALSE.
Add created_at and updated_at to almost every table for free bookkeeping. For anything regulated or sensitive, also keep an audit / history table that records a copy of each change with who and when — so you can reconstruct any row's past.
Soft delete pattern
Flag rows instead of removing them.
-- SOFT DELETE: don't physically remove a row — flag it.
-- Keeps history, lets you "undelete", and protects against accidents.
ALTER TABLE orders ADD COLUMN is_deleted BOOLEAN NOT NULL DEFAULT FALSE;
ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMPTZ; -- NULL = live
-- "Deleting" is really an UPDATE:
UPDATE orders SET is_deleted = TRUE, deleted_at = NOW()
WHERE order_id = 12345;
-- EVERY normal query must now exclude deleted rows:
SELECT * FROM orders WHERE is_deleted = FALSE;
...Timestamps + audit/history table
Track who changed what, and when.
-- TIMESTAMPS + AUDIT/HISTORY: who changed what, and when.
-- 1) created_at / updated_at on the live table — bookkeeping for every row:
ALTER TABLE orders ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
ALTER TABLE orders ADD COLUMN updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
-- (a trigger or your app sets updated_at = NOW() on each UPDATE)
-- 2) A separate HISTORY table records a copy of every change.
CREATE TABLE orders_history (
history_id BIGSERIAL PRIMARY KEY,
order_
...Your Turn: filter live rows & pick a key
Fill in the soft-delete filter, then read the key-strategy reasoning in the comments and confirm you'd make the same call.
🎯 Your Turn: soft delete + key choice
Add the live-row filter; choose a key for a lookup table.
-- 🎯 YOUR TURN — two small fixes for production-grade reads.
-- (a) Return only LIVE customers (soft-delete column already exists).
-- (b) Pick the right PRIMARY KEY for a 'countries' lookup table.
SELECT customer_id, name
FROM customers
WHERE ___ = FALSE; -- 👉 the soft-delete flag column
-- Key strategy: countries have a stable, standard ISO code ('US','GB').
-- For THIS small fixed reference table, the natural key is fine:
-- CREATE TABLE countries ( iso_code CHAR(2) PRIMARY KEY,
...7. The EAV Anti-Pattern (and when not to use it)
EAV (Entity-Attribute-Value) stores one row per attribute, so any entity can carry any fields without schema changes. It looks like ultimate flexibility — and it's one of the most common ways to wreck a database.
Because every value is a string in one column, you lose data types, NOT NULL, foreign keys, and simple filters like WHERE ram_gb > 16. A single logical record sprays across many rows, so even reading one item needs an ugly pivot.
JSONB column gives you flexible, queryable, indexable attributes without the EAV tax. Reserve EAV for genuinely open-ended, rarely-queried metadata.EAV — and the JSONB alternative
See why EAV hurts, and what to use instead.
-- EAV = Entity-Attribute-Value. ONE row per attribute, so any entity
-- can have any fields. Tempting for "user-defined fields"... but dangerous.
CREATE TABLE eav_values (
entity_id BIGINT NOT NULL,
attribute_name VARCHAR(100) NOT NULL,
value TEXT, -- everything is a string: no types!
PRIMARY KEY (entity_id, attribute_name)
);
-- Reading even ONE product means re-assembling it row by row:
SELECT MAX(CASE WHEN attribute_name = 'cpu' THEN value
...8. Normalisation vs Deliberate Denormalisation
Normalisation — storing each fact exactly once — is the default, because it keeps writes consistent. But on a proven read hot-path, joining the same tables a million times a day can dominate your latency.
Deliberate denormalisation copies a value (say, the customer's name onto the orders table) to skip a join. It's a conscious trade: faster reads in exchange for having to keep the copy in sync on writes. Do it only after measuring, and document why — accidental duplication is a bug, intentional duplication is an optimisation.
Denormalise on purpose
Trade write-consistency for read speed — deliberately.
-- NORMALISATION removes duplicated facts so writes stay consistent.
-- DELIBERATE DENORMALISATION re-adds a copy to make reads cheaper —
-- a conscious trade, not laziness.
-- Normalised: customer name lives only in customers. To show it on an
-- order you must JOIN every time:
SELECT o.order_id, c.name
FROM orders o JOIN customers c ON o.customer_id = c.customer_id;
-- Denormalised: copy the name onto orders to skip the JOIN on a hot path:
-- ALTER TABLE orders ADD COLUMN customer_name VARCH
...Common Errors (and the fix)
- Reaching for EAV too soon: "we might add fields later" is not a reason to throw away types and constraints. Start with real columns; use a
JSONBcolumn for the genuinely dynamic bits. - Natural keys that change: using
emailor a username as a primary key looks clean until someone updates it and breaks every foreign key. Use a surrogate key + aUNIQUEconstraint instead. - No audit trail: hard-deleting or overwriting rows with no history means "what did this look like last week?" is unanswerable. Add
created_at/updated_atand a history table before you need them. - Over-normalising the read path: splitting data across ten tables is correct but can make a hot dashboard query crawl. Denormalise a measured bottleneck on purpose rather than normalising on reflex.
- Forgetting the soft-delete filter: after adding
is_deleted, queries that omitWHERE is_deleted = FALSEresurrect "deleted" rows. Bake the filter into a view if it's easy to forget.
📘 Quick Reference
| Pattern | Use it when… | Trade-off |
|---|---|---|
| OLTP (normalised) | Serving a live app | Joins on reads |
| OLAP (star schema) | Analytics / dashboards | Duplicated data |
| Surrogate key | Almost every app table | No human meaning |
| Lookup table | Fixed value set (status, role) | One extra join |
| Junction table | Many-to-many relationship | Third table to maintain |
| Adjacency list | Trees, write-heavy | Recursive reads |
| Closure table | Trees, read-heavy | Heavier writes |
| Soft delete | Need history / undo | Filter every query |
| Audit/history table | Regulated or sensitive data | Storage + write cost |
| EAV | Truly open-ended metadata | Loses types/constraints |
| Denormalisation | Proven read bottleneck | Keep copies in sync |
Frequently Asked Questions
Q: Should I use auto-increment integers or UUIDs for surrogate keys?
Both are surrogate keys. Auto-increment (BIGSERIAL) is compact and index-friendly; UUID is globally unique without a central counter, which helps with distributed systems and hiding row counts. Pick UUIDs when IDs are generated outside one database or must not be guessable.
Q: Can one database be both OLTP and OLAP?
For a while, yes — a normalised OLTP database can also run reports. As volume grows, the two workloads fight over the same machine, so teams replicate the data into a separate warehouse (the star schema) for analytics and leave the app database lean.
Q: Is denormalisation just bad design?
No — accidental duplication is a bug, but deliberate denormalisation is a recognised optimisation. The test is whether it's intentional, measured, and documented, and whether you have a plan to keep the copies in sync.
Q: Adjacency list or closure table for my tree?
Default to the adjacency list — it's simplest, and recursive CTEs handle most reads fine. Switch to a closure table only when you query whole subtrees so often that the recursion becomes the bottleneck.
Mini-Challenge: Design a Blog Schema
Put the patterns together — a brief, an outline, and the expected shape in the comments. Write the CREATE TABLE statements, then paste them into a playground to confirm they run.
🎯 Mini-Challenge
authors, posts, tags, a junction, surrogate keys + soft delete.
-- 🎯 MINI-CHALLENGE — design a small schema for a blogging platform.
-- Requirements:
-- 1. authors and posts: an author writes many posts (one-to-many).
-- 2. tags: a post can have many tags, a tag many posts (many-to-many).
-- 3. Use SURROGATE primary keys (BIGSERIAL) on authors, posts, tags.
-- 4. Support SOFT DELETE on posts (is_deleted + deleted_at).
-- 5. Add created_at to posts.
--
-- ✅ Expected shape: 4 tables — authors, posts, tags, and a junction
-- post_tags(post_id, tag
...🎉 Lesson Complete
- ✅ OLTP normalises for fast writes; OLAP star schemas denormalise for fast analytics
- ✅ Prefer surrogate keys + a
UNIQUEnatural-key constraint - ✅ Lookup tables tame free text; junction tables model many-to-many
- ✅ Adjacency list (cheap writes) vs closure table (cheap reads) for hierarchies
- ✅ Soft deletes, timestamps, and audit tables preserve history
- ✅ Avoid EAV; reach for
JSONB, and denormalise only on purpose - ✅ Next: cross-database queries and foreign data wrappers
Sign up for free to track which lessons you've completed and get learning reminders.