Skip to main content
    Courses/SQL/Database Design Patterns

    Lesson 44 • Advanced Track

    Database Design Patterns

    By the end of this lesson you'll be able to design schemas that hold up in real, large systems — choosing transactional vs analytical layouts, the right key strategy, lookup and junction tables, hierarchies, soft deletes, audit trails, and knowing exactly when to break the rules of normalisation. This is the difference between a schema that survives and one you rebuild at 2am.

    What You'll Learn

    • Pick OLTP vs OLAP schema shapes for the workload
    • Choose surrogate vs natural keys deliberately
    • Model many-to-many with junction tables and lookups
    • Store hierarchies: adjacency list vs closure table
    • Add soft deletes, timestamps, and audit/history tables
    • Avoid the EAV anti-pattern and denormalise on purpose

    How to read this lesson

    🏗️ Real-world analogy

    A schema is the foundation and frame of a building. You can repaint walls (queries) any time, but moving a load-bearing wall (the table design) after people have moved in is enormously expensive. These patterns are the structural-engineering rules that stop the building falling down as it grows from a cottage into a tower.

    There's no single sample table this time — each pattern brings its own small schema. Read the comments as the lesson; they state the why and the trade-off for every design choice.

    1. OLTP vs OLAP — Two Opposite Goals

    Before any table, ask one question: is this database serving an app or answering analytics? They pull in opposite directions.

    OLTP (Online Transaction Processing) is your live app: thousands of tiny reads and writes a second, each touching one or two rows. You normalise — store each fact once — so an update is consistent and cheap.

    OLAP (Online Analytical Processing) is the dashboard: a few enormous queries that scan history and aggregate. You denormalise into a star schema so analysts join less and scan faster.

    ⚡ OLTP — like a cash register

    Web apps, banking, checkout. Fast single-row operations. Normalised, write-heavy.

    📊 OLAP — like a year-end report

    Warehouses, BI. Huge aggregations over millions of rows. Denormalised, read-heavy.

    OLTP — normalised transactional schema

    Small, fast, single-row reads and writes.

    Try it Yourself »
    SQL
    -- OLTP = Online Transaction Processing.
    -- The shape behind a live app: small, fast, single-row reads and writes.
    -- Normalised (no duplicated facts) so an UPDATE touches data in ONE place.
    
    CREATE TABLE customers (
        customer_id  BIGSERIAL PRIMARY KEY,     -- surrogate key (see section 2)
        email        VARCHAR(255) UNIQUE NOT NULL,
        name         VARCHAR(100) NOT NULL,
        created_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
    );
    
    CREATE TABLE orders (
        order_id     BIGSERIAL PRIMARY KEY,
     
    ...

    OLAP — star schema for analytics

    Fact + dimension tables built to aggregate.

    Try it Yourself »
    SQL
    -- OLAP = Online Analytical Processing.
    -- The shape behind dashboards: scan MILLIONS of rows, aggregate, report.
    -- A "star schema": one FACT table (the events) surrounded by DIMENSION
    -- tables (the descriptions). Deliberately denormalised so analysts join less.
    
    -- Fact: one row per measurable event, mostly numbers (the "measures").
    CREATE TABLE fact_sales (
        sale_id       BIGSERIAL PRIMARY KEY,
        date_key      INT NOT NULL,    -- foreign keys into the dimensions
        product_key   INT NO
    ...

    2. Surrogate vs Natural Keys

    A primary key uniquely identifies a row. A natural key uses real-world data (an email, an ISBN). A surrogate key is a meaningless auto-number (BIGSERIAL, IDENTITY, or a UUID) that exists only to be the identifier.

    The problem with natural keys: real-world data changes. If email is your key and every order references it, a customer changing their email forces you to rewrite every foreign key. Surrogate keys never change and never get reused, so foreign keys stay stable forever.

    Surrogate vs natural keys

    Why the auto-number wins for app tables.

    Try it Yourself »
    SQL
    -- A NATURAL key is real-world data used as the identifier
    -- (email, ISBN, country code). A SURROGATE key is a meaningless
    -- auto-generated number (BIGSERIAL / IDENTITY / UUID) with no meaning.
    
    -- Natural key — looks tidy, but breaks when reality changes:
    CREATE TABLE users_natural (
        email VARCHAR(255) PRIMARY KEY,   -- what happens when they change email?
        name  VARCHAR(100)
    );
    
    -- Surrogate key — the modern default for app tables:
    CREATE TABLE users_surrogate (
        user_id BIGSERIAL 
    ...

    3. Lookup / Reference Tables

    When a column can only hold a small fixed set of values — statuses, roles, countries — don't store free text. Put the allowed values in a tiny lookup table and reference it with a foreign key. Now a typo like 'Shiped' is rejected by the database, not silently saved.

    A status lookup table

    Controlled vocabulary instead of free text.

    Try it Yourself »
    SQL
    -- A LOOKUP (reference) table replaces a free-text column with a
    -- small controlled list. Instead of typing 'shipped' / 'Shipped' /
    -- 'SHIPED' all over the orders table, you point at one row.
    
    CREATE TABLE order_statuses (
        status_id    SMALLINT PRIMARY KEY,
        code         VARCHAR(20) UNIQUE NOT NULL,   -- 'pending', 'shipped', ...
        description  VARCHAR(100) NOT NULL
    );
    
    INSERT INTO order_statuses (status_id, code, description) VALUES
        (1, 'pending',   'Awaiting payment'),
        (2, '
    ...

    4. Many-to-Many: the Junction Table

    A one-to-many relationship (one customer, many orders) fits with a foreign key on the "many" side. But many-to-many — students and courses, posts and tags — won't fit in either table. The fix is a third junction table (also called a bridge or linking table) that stores one row per pairing.

    Give the junction a composite primary key of both foreign keys: that guarantees a student can't be enrolled in the same course twice.

    Junction table schema

    students ↔ enrolments ↔ courses.

    Try it Yourself »
    SQL
    -- A student can take MANY courses; a course has MANY students.
    -- Many-to-many can't live in one table — you need a JUNCTION table
    -- (also called a join / bridge / linking table) holding one row per pair.
    
    CREATE TABLE students (
        student_id BIGSERIAL PRIMARY KEY,
        name       VARCHAR(100) NOT NULL
    );
    CREATE TABLE courses (
        course_id  BIGSERIAL PRIMARY KEY,
        title      VARCHAR(150) NOT NULL
    );
    
    -- The junction: each row links ONE student to ONE course.
    CREATE TABLE enrolments (
       
    ...

    To read across the relationship, you join through the junction in the middle — connect the left side to the link, then the link to the right side.

    Worked query: students and their courses

    Join through the junction table.

    Try it Yourself »
    SQL
    -- WORKED QUERY: list every student with the courses they take.
    -- You walk the junction in the middle to connect the two sides.
    
    SELECT  s.name  AS student,
            c.title AS course
    FROM students s
    JOIN enrolments e ON s.student_id = e.student_id   -- student -> link
    JOIN courses    c ON e.course_id  = c.course_id    -- link -> course
    ORDER BY s.name, c.title;

    Result — one row per (student, course) pair:

    studentcourse
    Ada LovelaceData Modelling
    Ada LovelaceSQL Basics
    Grace HopperSQL Basics

    Your Turn: count students per course

    Fill in the one blank to finish the join, then aggregate. The expected result is in the comments so you can check yourself.

    🎯 Your Turn: junction join

    Complete the JOIN onto the enrolments table.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — complete the many-to-many join.
    -- Goal: count how many students are enrolled in EACH course.
    
    SELECT  c.title,
            COUNT(*) AS student_count
    FROM courses c
    JOIN enrolments e ON c.course_id = ___   -- 👉 the matching column in enrolments
    GROUP BY c.title
    ORDER BY student_count DESC;
    
    -- ✅ Expected: one row per course, e.g.
    --    SQL Basics | 3 ,  Data Modelling | 2 , ...

    5. Self-Referencing Hierarchies

    Categories with subcategories, employees with managers, threaded comments — these are trees, and a table can point at itself to model them. There are two classic patterns with opposite trade-offs.

    The adjacency list stores each row's parent_id. Writes are trivial, but reading a whole subtree needs a recursive query.

    The closure table stores every ancestor→descendant pair, so reading a subtree is a plain join — but every move means maintaining many path rows.

    Adjacency list (parent_id)

    Cheap writes, recursive reads.

    Try it Yourself »
    SQL
    -- HIERARCHY pattern A: Adjacency list.
    -- Each row stores a pointer to its PARENT. Simple, cheap to write.
    
    CREATE TABLE categories (
        category_id BIGSERIAL PRIMARY KEY,
        name        VARCHAR(100) NOT NULL,
        parent_id   BIGINT REFERENCES categories(category_id)  -- self-reference
    );
    -- Electronics(parent NULL) -> Phones(parent Electronics) -> iPhone(parent Phones)
    
    -- Walking the tree needs a RECURSIVE query:
    WITH RECURSIVE tree AS (
        SELECT category_id, name, parent_id, 1 AS depth
     
    ...

    Closure table (all paths)

    Cheap reads, heavier writes.

    Try it Yourself »
    SQL
    -- HIERARCHY pattern B: Closure table.
    -- Store EVERY ancestor-descendant pair (including self). Reads are a plain
    -- JOIN with no recursion — great when you query subtrees constantly.
    
    CREATE TABLE category_paths (
        ancestor_id   BIGINT NOT NULL REFERENCES categories(category_id),
        descendant_id BIGINT NOT NULL REFERENCES categories(category_id),
        depth         INT NOT NULL,           -- 0 = the node itself
        PRIMARY KEY (ancestor_id, descendant_id)
    );
    
    -- "All descendants of Electro
    ...

    6. Soft Deletes, Timestamps & Audit Tables

    Real systems rarely truly delete data. A soft delete flags a row as gone (is_deleted and/or deleted_at) instead of removing it — so you keep history, can undo mistakes, and never lose referenced rows. The catch: every normal query must add WHERE is_deleted = FALSE.

    Add created_at and updated_at to almost every table for free bookkeeping. For anything regulated or sensitive, also keep an audit / history table that records a copy of each change with who and when — so you can reconstruct any row's past.

    Soft delete pattern

    Flag rows instead of removing them.

    Try it Yourself »
    SQL
    -- SOFT DELETE: don't physically remove a row — flag it.
    -- Keeps history, lets you "undelete", and protects against accidents.
    
    ALTER TABLE orders ADD COLUMN is_deleted BOOLEAN     NOT NULL DEFAULT FALSE;
    ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMPTZ;          -- NULL = live
    
    -- "Deleting" is really an UPDATE:
    UPDATE orders SET is_deleted = TRUE, deleted_at = NOW()
    WHERE order_id = 12345;
    
    -- EVERY normal query must now exclude deleted rows:
    SELECT * FROM orders WHERE is_deleted = FALSE;
    ...

    Timestamps + audit/history table

    Track who changed what, and when.

    Try it Yourself »
    SQL
    -- TIMESTAMPS + AUDIT/HISTORY: who changed what, and when.
    
    -- 1) created_at / updated_at on the live table — bookkeeping for every row:
    ALTER TABLE orders ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
    ALTER TABLE orders ADD COLUMN updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
    -- (a trigger or your app sets updated_at = NOW() on each UPDATE)
    
    -- 2) A separate HISTORY table records a copy of every change.
    CREATE TABLE orders_history (
        history_id  BIGSERIAL PRIMARY KEY,
        order_
    ...

    Your Turn: filter live rows & pick a key

    Fill in the soft-delete filter, then read the key-strategy reasoning in the comments and confirm you'd make the same call.

    🎯 Your Turn: soft delete + key choice

    Add the live-row filter; choose a key for a lookup table.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — two small fixes for production-grade reads.
    -- (a) Return only LIVE customers (soft-delete column already exists).
    -- (b) Pick the right PRIMARY KEY for a 'countries' lookup table.
    
    SELECT customer_id, name
    FROM customers
    WHERE ___ = FALSE;          -- 👉 the soft-delete flag column
    
    -- Key strategy: countries have a stable, standard ISO code ('US','GB').
    -- For THIS small fixed reference table, the natural key is fine:
    -- CREATE TABLE countries ( iso_code CHAR(2) PRIMARY KEY, 
    ...

    7. The EAV Anti-Pattern (and when not to use it)

    EAV (Entity-Attribute-Value) stores one row per attribute, so any entity can carry any fields without schema changes. It looks like ultimate flexibility — and it's one of the most common ways to wreck a database.

    Because every value is a string in one column, you lose data types, NOT NULL, foreign keys, and simple filters like WHERE ram_gb > 16. A single logical record sprays across many rows, so even reading one item needs an ugly pivot.

    EAV — and the JSONB alternative

    See why EAV hurts, and what to use instead.

    Try it Yourself »
    SQL
    -- EAV = Entity-Attribute-Value. ONE row per attribute, so any entity
    -- can have any fields. Tempting for "user-defined fields"... but dangerous.
    
    CREATE TABLE eav_values (
        entity_id      BIGINT NOT NULL,
        attribute_name VARCHAR(100) NOT NULL,
        value          TEXT,                 -- everything is a string: no types!
        PRIMARY KEY (entity_id, attribute_name)
    );
    
    -- Reading even ONE product means re-assembling it row by row:
    SELECT  MAX(CASE WHEN attribute_name = 'cpu'    THEN value 
    ...

    8. Normalisation vs Deliberate Denormalisation

    Normalisation — storing each fact exactly once — is the default, because it keeps writes consistent. But on a proven read hot-path, joining the same tables a million times a day can dominate your latency.

    Deliberate denormalisation copies a value (say, the customer's name onto the orders table) to skip a join. It's a conscious trade: faster reads in exchange for having to keep the copy in sync on writes. Do it only after measuring, and document why — accidental duplication is a bug, intentional duplication is an optimisation.

    Denormalise on purpose

    Trade write-consistency for read speed — deliberately.

    Try it Yourself »
    SQL
    -- NORMALISATION removes duplicated facts so writes stay consistent.
    -- DELIBERATE DENORMALISATION re-adds a copy to make reads cheaper —
    -- a conscious trade, not laziness.
    
    -- Normalised: customer name lives only in customers. To show it on an
    -- order you must JOIN every time:
    SELECT o.order_id, c.name
    FROM orders o JOIN customers c ON o.customer_id = c.customer_id;
    
    -- Denormalised: copy the name onto orders to skip the JOIN on a hot path:
    -- ALTER TABLE orders ADD COLUMN customer_name VARCH
    ...

    Common Errors (and the fix)

    • Reaching for EAV too soon: "we might add fields later" is not a reason to throw away types and constraints. Start with real columns; use a JSONB column for the genuinely dynamic bits.
    • Natural keys that change: using email or a username as a primary key looks clean until someone updates it and breaks every foreign key. Use a surrogate key + a UNIQUE constraint instead.
    • No audit trail: hard-deleting or overwriting rows with no history means "what did this look like last week?" is unanswerable. Add created_at/updated_at and a history table before you need them.
    • Over-normalising the read path: splitting data across ten tables is correct but can make a hot dashboard query crawl. Denormalise a measured bottleneck on purpose rather than normalising on reflex.
    • Forgetting the soft-delete filter: after adding is_deleted, queries that omit WHERE is_deleted = FALSE resurrect "deleted" rows. Bake the filter into a view if it's easy to forget.

    📘 Quick Reference

    PatternUse it when…Trade-off
    OLTP (normalised)Serving a live appJoins on reads
    OLAP (star schema)Analytics / dashboardsDuplicated data
    Surrogate keyAlmost every app tableNo human meaning
    Lookup tableFixed value set (status, role)One extra join
    Junction tableMany-to-many relationshipThird table to maintain
    Adjacency listTrees, write-heavyRecursive reads
    Closure tableTrees, read-heavyHeavier writes
    Soft deleteNeed history / undoFilter every query
    Audit/history tableRegulated or sensitive dataStorage + write cost
    EAVTruly open-ended metadataLoses types/constraints
    DenormalisationProven read bottleneckKeep copies in sync

    Frequently Asked Questions

    Q: Should I use auto-increment integers or UUIDs for surrogate keys?

    Both are surrogate keys. Auto-increment (BIGSERIAL) is compact and index-friendly; UUID is globally unique without a central counter, which helps with distributed systems and hiding row counts. Pick UUIDs when IDs are generated outside one database or must not be guessable.

    Q: Can one database be both OLTP and OLAP?

    For a while, yes — a normalised OLTP database can also run reports. As volume grows, the two workloads fight over the same machine, so teams replicate the data into a separate warehouse (the star schema) for analytics and leave the app database lean.

    Q: Is denormalisation just bad design?

    No — accidental duplication is a bug, but deliberate denormalisation is a recognised optimisation. The test is whether it's intentional, measured, and documented, and whether you have a plan to keep the copies in sync.

    Q: Adjacency list or closure table for my tree?

    Default to the adjacency list — it's simplest, and recursive CTEs handle most reads fine. Switch to a closure table only when you query whole subtrees so often that the recursion becomes the bottleneck.

    Mini-Challenge: Design a Blog Schema

    Put the patterns together — a brief, an outline, and the expected shape in the comments. Write the CREATE TABLE statements, then paste them into a playground to confirm they run.

    🎯 Mini-Challenge

    authors, posts, tags, a junction, surrogate keys + soft delete.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE — design a small schema for a blogging platform.
    -- Requirements:
    --   1. authors and posts: an author writes many posts (one-to-many).
    --   2. tags: a post can have many tags, a tag many posts (many-to-many).
    --   3. Use SURROGATE primary keys (BIGSERIAL) on authors, posts, tags.
    --   4. Support SOFT DELETE on posts (is_deleted + deleted_at).
    --   5. Add created_at to posts.
    --
    -- ✅ Expected shape: 4 tables — authors, posts, tags, and a junction
    --    post_tags(post_id, tag
    ...

    🎉 Lesson Complete

    • OLTP normalises for fast writes; OLAP star schemas denormalise for fast analytics
    • ✅ Prefer surrogate keys + a UNIQUE natural-key constraint
    • Lookup tables tame free text; junction tables model many-to-many
    • Adjacency list (cheap writes) vs closure table (cheap reads) for hierarchies
    • Soft deletes, timestamps, and audit tables preserve history
    • ✅ Avoid EAV; reach for JSONB, and denormalise only on purpose
    • Next: cross-database queries and foreign data wrappers

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service