Skip to main content

    Final Project • Advanced Track

    Build a Bookstore Database

    This is your capstone. You'll build BookNook — a small but complete bookstore database — from an empty editor to indexed, transactional, report-ready SQL. Every concept from the course shows up here, assembled into one real system you design and query yourself.

    What You'll Build

    • A 4-table schema with PKs, FKs, and CHECK constraints
    • Seed data inserted in the correct FK order
    • Core reports: revenue per category and top customers
    • Indexes for the hot queries (and reading an EXPLAIN plan)
    • A reusable view plus a safe place-an-order transaction
    • A window-function best-seller ranking using a CTE

    The Plan: 6 Milestones

    You're building a bookstore. Customers place orders; each order contains one or more books (the line items). Here's the path from empty database to finished system:

    1. Design the schema — four tables wired together with keys and constraints.
    2. Seed sample data — insert rows in the right order.
    3. Core queries — revenue per category and top customers.
    4. Indexes — speed up the hot paths, then read an EXPLAIN plan.
    5. A view + a transaction — reuse a report, place an order safely.
    6. Advanced touch — rank best-sellers with a window function + CTE.

    Milestone 1 — Design the Schema

    A good schema is the foundation everything else stands on. You'll create four tables. A primary key (PK) is the column that uniquely identifies a row; a foreign key (FK) is a column that points at another table's PK, which is how the database knows an order belongs to a customer.

    📚 Real-world analogy

    Think of order_items as the receipt lines. The order is the receipt; each line says "1 copy of Dune". The line links the receipt (order_id) to the book (product_id) — exactly what a foreign key does.

    Notice the guardrails: NOT NULL forbids blanks, UNIQUE stops duplicate emails, CHECK rejects nonsense like a negative price, and REFERENCES enforces that every order points at a real customer.

    Create the four tables

    customers, products, orders, order_items — with PKs, FKs and constraints.

    Try it Yourself »
    SQL
    -- MILESTONE 1 — Build the BookNook bookstore schema
    -- Four tables, linked by primary keys (PK) and foreign keys (FK).
    
    -- customers: one row per shopper. The PK uniquely identifies each row.
    CREATE TABLE customers (
        customer_id   INTEGER PRIMARY KEY,          -- PK: unique id for each customer
        full_name     TEXT    NOT NULL,             -- NOT NULL = required, no blanks
        email         TEXT    NOT NULL UNIQUE,       -- UNIQUE = no two customers share one
        joined_on     DATE    NOT
    ...

    Milestone 2 — Seed Sample Data

    Empty tables can't be queried meaningfully, so add rows. The one rule that trips everyone up: insert parents before children. A foreign key can only point at a row that already exists, so customers and products must exist before the orders that reference them, and orders before their items.

    Insert the sample rows

    3 customers, 5 products, 4 orders, 6 line items — in FK-safe order.

    Try it Yourself »
    SQL
    -- MILESTONE 2 — Insert sample rows so we have something to query.
    -- Insert PARENTS before CHILDREN: customers + products first, then orders,
    -- then order_items (a FK can only point at a row that already exists).
    
    INSERT INTO customers (customer_id, full_name, email, joined_on) VALUES
        (1, 'Ada Lovelace',   'ada@example.com',   '2025-01-10'),
        (2, 'Alan Turing',    'alan@example.com',  '2025-02-02'),
        (3, 'Grace Hopper',   'grace@example.com', '2025-02-20');
    
    INSERT INTO products (pro
    ...

    Result — products after seeding:

    product_idtitlecategorypricestock
    10The SQL CookbookTech39120
    11Clean CodeTech32.560
    12DuneSciFi18200

    Milestone 3 — Core Queries

    Now the payoff: answering real business questions. Both reports below JOIN tables back together and use an aggregate (SUM, COUNT) with GROUP BY to collapse many rows into one summary row per group.

    Revenue per category

    JOIN order_items to products, then SUM price × quantity per category.

    Try it Yourself »
    SQL
    -- MILESTONE 3a — Revenue per category
    -- JOIN three tables, multiply price x quantity, then total it per category.
    
    SELECT
        p.category,
        SUM(p.price * oi.quantity)  AS revenue   -- money earned per book line
    FROM order_items oi
    JOIN products p ON p.product_id = oi.product_id   -- attach the book's price + category
    GROUP BY p.category                               -- one output row per category
    ORDER BY revenue DESC;
    
    -- ✅ Expected result:
    --   Tech  | 110.50   (39.00 + 32.50 + 39.00)
    --  
    ...

    Result — 2 rows, highest first:

    categoryrevenue
    Tech110.5
    SciFi102

    Top customers by spend

    Walk all four tables to total each customer's lifetime spend.

    Try it Yourself »
    SQL
    -- MILESTONE 3b — Top customers by amount spent
    -- Walk customers → orders → order_items → products to add up each spend.
    
    SELECT
        c.full_name,
        COUNT(DISTINCT o.order_id)   AS orders,      -- how many orders they placed
        SUM(p.price * oi.quantity)   AS total_spent  -- their lifetime spend
    FROM customers c
    JOIN orders       o  ON o.customer_id = c.customer_id
    JOIN order_items  oi ON oi.order_id   = o.order_id
    JOIN products     p  ON p.product_id  = oi.product_id
    GROUP BY c.customer_id, 
    ...

    Your Turn #1: average per customer

    The joins are written for you. Fill in the two ___ blanks: the aggregate that means "average", and the column to group by. The expected shape is in the comments.

    🎯 Your Turn: AVG per customer

    Replace the ___ blanks to compute an average per customer.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN #1 — average order value per customer
    -- Goal: for each customer show their name and the AVERAGE money per order.
    -- The joins are done for you. Fill in the two blanks.
    
    SELECT
        c.full_name,
        ___(p.price * oi.quantity)   AS avg_per_line   -- 👉 the aggregate that means "average"
    FROM customers c
    JOIN orders       o  ON o.customer_id = c.customer_id
    JOIN order_items  oi ON oi.order_id   = o.order_id
    JOIN products     p  ON p.product_id  = oi.product_id
    GROUP BY c.customer_id
    ...

    Milestone 4 — Add Indexes for the Hot Queries

    As data grows, the queries you run most often need to be fast. An index is a sorted lookup structure — like the index at the back of a book — that lets the database jump straight to matching rows instead of scanning every one. Index the columns you filter on (WHERE) and join on (ON).

    EXPLAIN (or EXPLAIN QUERY PLAN in SQLite) shows you the database's plan for a query without running it. Seeing "USING INDEX" instead of "SCAN" tells you the index is doing its job.

    Index the join/filter columns

    Create four indexes, then inspect a query plan with EXPLAIN.

    Try it Yourself »
    SQL
    -- MILESTONE 4 — Speed up the hot queries with indexes.
    -- An index is a lookup structure (like a book's index) the database can scan
    -- instead of reading every row. Index the columns you FILTER or JOIN on.
    
    -- We JOIN order_items to orders on order_id and to products on product_id a LOT:
    CREATE INDEX idx_items_order   ON order_items(order_id);
    CREATE INDEX idx_items_product ON order_items(product_id);
    
    -- We frequently look up orders by customer and filter by status:
    CREATE INDEX idx_orders_cu
    ...

    Milestone 5 — A View and a Transaction

    A view saves a SELECT under a name so you can reuse a complex report as if it were a simple table — it stores no data, just the query. A transaction bundles several writes so they either all succeed or all get undone; that all-or-nothing property is called atomicity.

    Placing an order touches three tables (create the order, add the items, lower the stock). Without a transaction, a crash halfway through leaves you with an order that sold phantom stock. BEGIN … COMMIT makes those three changes one indivisible unit; ROLLBACK throws them all away.

    Create a reusable view

    Save the per-order total as order_totals, then query it like a table.

    Try it Yourself »
    SQL
    -- MILESTONE 5a — A VIEW: save a query under a name and reuse it.
    -- A view is a stored SELECT. It holds no data of its own — it re-runs each
    -- time you query it, so it's always up to date.
    
    CREATE VIEW order_totals AS
    SELECT
        o.order_id,
        o.customer_id,
        o.status,
        SUM(p.price * oi.quantity)  AS order_total
    FROM orders o
    JOIN order_items oi ON oi.order_id   = o.order_id
    JOIN products    p  ON p.product_id  = oi.product_id
    GROUP BY o.order_id, o.customer_id, o.status;
    
    -- Now query 
    ...

    Result — delivered orders only:

    order_idcustomer_idstatusorder_total
    1001delivered75
    1022delivered105

    Place an order atomically

    Insert order + item and decrement stock inside BEGIN … COMMIT.

    Try it Yourself »
    SQL
    -- MILESTONE 5b — A TRANSACTION: place an order atomically.
    -- A transaction groups statements so they ALL succeed or ALL roll back.
    -- "Atomic" = no half-finished state. Perfect for "take stock + record sale".
    
    BEGIN TRANSACTION;
    
    -- 1) Create the order header
    INSERT INTO orders (order_id, customer_id, ordered_on, status)
    VALUES (104, 3, '2025-04-10', 'pending');
    
    -- 2) Add a line item: 2 copies of "Clean Code" (product 11)
    INSERT INTO order_items (order_id, product_id, quantity)
    VALUES (104, 1
    ...

    Your Turn #2: undo with ROLLBACK

    Fill in the two ___ blanks so the bad insert is thrown away and order 999 never exists. One blank starts the transaction, the other cancels it.

    🎯 Your Turn: BEGIN / ROLLBACK

    Start a transaction, then cancel it so nothing is saved.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN #2 — undo a mistake with ROLLBACK
    -- You start a transaction, realise the quantity is wrong, and want to abandon it.
    -- Fill in the two blanks so NOTHING is saved.
    
    ___;   -- 👉 the keyword that STARTS a transaction (two words)
    
    INSERT INTO orders (order_id, customer_id, ordered_on, status)
    VALUES (999, 1, '2025-04-11', 'pending');
    
    -- Oops — wrong order. Throw the whole thing away:
    ___;   -- 👉 the keyword that CANCELS everything since BEGIN
    
    -- ✅ Expected result: order 999 does
    ...

    Milestone 6 — Advanced Touch: Best-Sellers per Category

    Time to flex. A CTE (the WITH block) names an intermediate result so the final query reads cleanly. A window function like RANK() OVER (...) computes a value across a set of rows without collapsing them the way GROUP BY does — so you keep every book and get its rank.

    PARTITION BY category restarts the ranking for each category, so every category gets its own #1, #2, #3 — exactly what a "best-sellers by section" report needs.

    Rank best-sellers with a window function

    A CTE feeds RANK() OVER (PARTITION BY category ...).

    Try it Yourself »
    SQL
    -- MILESTONE 6 — Advanced touch: rank best-sellers WITHIN each category.
    -- A CTE (the WITH block) names a sub-result. A WINDOW FUNCTION (RANK() OVER...)
    -- computes a value across a group of rows WITHOUT collapsing them with GROUP BY.
    
    WITH sales AS (                                  -- CTE: revenue per book
        SELECT
            p.category,
            p.title,
            SUM(p.price * oi.quantity) AS revenue
        FROM order_items oi
        JOIN products p ON p.product_id = oi.product_id
        GROUP BY p.cate
    ...

    Result — ranked within each category:

    categorytitlerevenuerank_in_category
    SciFiProject Hail Mary661
    SciFiDune362
    TechThe SQL Cookbook781
    TechClean Code32.52

    Common Pitfalls (and the fix)

    • "FOREIGN KEY constraint failed" on INSERT: you inserted a child before its parent. Insert customers and products first, then orders, then order_items.
    • "no such table: products": you ran a query before Milestone 1/2. Run the schema and seed blocks first, in the same session.
    • Wrong totals from a JOIN: joining orders straight to products with no order_items in between multiplies rows. Always route through the line-items table.
    • Column in SELECT but not in GROUP BY: every non-aggregated column you select must appear in GROUP BY, or the query is ambiguous and errors.
    • Forgetting COMMIT: changes inside BEGIN aren't permanent until you commit. Close the session first and they vanish.
    • Index seems ignored: on tiny tables the planner may still choose a full scan because it's faster — that's correct. Indexes pay off as rows grow.

    📘 Quick Reference — what this project used

    SyntaxPurpose
    CREATE TABLE … PRIMARY KEYDefine a table and its unique-row identifier
    REFERENCES other(col)A foreign key linking tables
    NOT NULL / UNIQUE / CHECKConstraints that keep data valid
    INSERT INTO … VALUES (…),(…)Add multiple rows at once
    JOIN … ONCombine rows from related tables
    GROUP BY + SUM/COUNT/AVGSummarise rows into per-group totals
    CREATE INDEX ON t(col)Speed up filters and joins on a column
    EXPLAIN QUERY PLANSee how a query will run
    CREATE VIEW … AS SELECTSave a query under a reusable name
    BEGIN … COMMIT / ROLLBACKRun writes atomically (all or nothing)
    WITH cte AS (…)Name an intermediate result (CTE)
    RANK() OVER (PARTITION BY …)Rank rows within groups (window function)

    Frequently Asked Questions

    Q: Why store line items in a separate order_items table?

    Because an order can contain many books, and a book can appear in many orders — a many-to-many relationship. The line-items table is the bridge that makes that possible cleanly.

    Q: Do I need indexes on such a tiny database?

    Not for speed — with a handful of rows a full scan is instant. You add them here to learn the workflow; on a table with millions of rows they're the difference between milliseconds and minutes.

    Q: Is a view slower than a real table?

    A plain view re-runs its query each time, so it's exactly as fast as that query. If you need to cache the result, that's a materialised view — a different tool.

    Q: What happens if a statement fails mid-transaction?

    Nothing is saved until COMMIT. You run ROLLBACK (or the session ends) and the database returns to exactly how it was before BEGIN.

    Q: My SQL playground rejected SERIAL or JSONB — why?

    Those are PostgreSQL-specific. This project uses portable SQLite-friendly types (INTEGER, TEXT, REAL) so it runs nearly anywhere. Swap dialects if you target a specific engine.

    🎯 Stretch Challenge: Add Product Reviews

    No answer this time — just a brief and a comment outline. Extend BookNook with a reviews feature, wire its foreign keys, and write an aggregate report. This is the faded, build-it-yourself rung. Sketch it here, then run it in a playground.

    🎯 Stretch Challenge

    Design a reviews table + an average-rating report, from the outline only.

    Try it Yourself »
    SQL
    -- 🎯 STRETCH CHALLENGE — extend BookNook with product reviews.
    -- No answer given. Design it yourself using everything above.
    --
    -- 1) Create a "reviews" table:
    --      review_id   INTEGER PRIMARY KEY
    --      product_id  INTEGER  -> FK to products(product_id)
    --      customer_id INTEGER  -> FK to customers(customer_id)
    --      rating      INTEGER  -> CHECK it's between 1 and 5
    --      comment     TEXT
    --      created_on  DATE NOT NULL
    --
    -- 2) Seed 4–5 reviews across a couple of books.
    --
    -- 3)
    ...

    🎉 Project Complete

    • ✅ You designed a normalised schema with PKs, FKs, and constraints
    • ✅ You seeded data in foreign-key-safe order
    • ✅ You answered business questions with JOINs, aggregates, and GROUP BY
    • ✅ You indexed hot columns and read an EXPLAIN plan
    • ✅ You built a reusable view and placed an order inside a transaction
    • ✅ You ranked best-sellers per category with a CTE + window function
    • Where to go next: take the Stretch Challenge further — add authentication tables, write triggers to keep stock in sync, or load a real public dataset and rebuild these reports against it. You now have the full loop: design → seed → query → optimise → secure.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service