Final Project • Advanced Track
Build a Bookstore Database
This is your capstone. You'll build BookNook — a small but complete bookstore database — from an empty editor to indexed, transactional, report-ready SQL. Every concept from the course shows up here, assembled into one real system you design and query yourself.
What You'll Build
- ✓A 4-table schema with PKs, FKs, and CHECK constraints
- ✓Seed data inserted in the correct FK order
- ✓Core reports: revenue per category and top customers
- ✓Indexes for the hot queries (and reading an EXPLAIN plan)
- ✓A reusable view plus a safe place-an-order transaction
- ✓A window-function best-seller ranking using a CTE
The Plan: 6 Milestones
You're building a bookstore. Customers place orders; each order contains one or more books (the line items). Here's the path from empty database to finished system:
- Design the schema — four tables wired together with keys and constraints.
- Seed sample data — insert rows in the right order.
- Core queries — revenue per category and top customers.
- Indexes — speed up the hot paths, then read an EXPLAIN plan.
- A view + a transaction — reuse a report, place an order safely.
- Advanced touch — rank best-sellers with a window function + CTE.
Milestone 1 — Design the Schema
A good schema is the foundation everything else stands on. You'll create four tables. A primary key (PK) is the column that uniquely identifies a row; a foreign key (FK) is a column that points at another table's PK, which is how the database knows an order belongs to a customer.
📚 Real-world analogy
Think of order_items as the receipt lines. The order is the receipt; each line says "1 copy of Dune". The line links the receipt (order_id) to the book (product_id) — exactly what a foreign key does.
Notice the guardrails: NOT NULL forbids blanks, UNIQUE stops duplicate emails, CHECK rejects nonsense like a negative price, and REFERENCES enforces that every order points at a real customer.
Create the four tables
customers, products, orders, order_items — with PKs, FKs and constraints.
-- MILESTONE 1 — Build the BookNook bookstore schema
-- Four tables, linked by primary keys (PK) and foreign keys (FK).
-- customers: one row per shopper. The PK uniquely identifies each row.
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY, -- PK: unique id for each customer
full_name TEXT NOT NULL, -- NOT NULL = required, no blanks
email TEXT NOT NULL UNIQUE, -- UNIQUE = no two customers share one
joined_on DATE NOT
...Milestone 2 — Seed Sample Data
Empty tables can't be queried meaningfully, so add rows. The one rule that trips everyone up: insert parents before children. A foreign key can only point at a row that already exists, so customers and products must exist before the orders that reference them, and orders before their items.
INSERT by separating each (...) tuple with a comma — far faster than one statement per row.Insert the sample rows
3 customers, 5 products, 4 orders, 6 line items — in FK-safe order.
-- MILESTONE 2 — Insert sample rows so we have something to query.
-- Insert PARENTS before CHILDREN: customers + products first, then orders,
-- then order_items (a FK can only point at a row that already exists).
INSERT INTO customers (customer_id, full_name, email, joined_on) VALUES
(1, 'Ada Lovelace', 'ada@example.com', '2025-01-10'),
(2, 'Alan Turing', 'alan@example.com', '2025-02-02'),
(3, 'Grace Hopper', 'grace@example.com', '2025-02-20');
INSERT INTO products (pro
...Result — products after seeding:
| product_id | title | category | price | stock |
|---|---|---|---|---|
| 10 | The SQL Cookbook | Tech | 39 | 120 |
| 11 | Clean Code | Tech | 32.5 | 60 |
| 12 | Dune | SciFi | 18 | 200 |
| … | … | … | … | … |
Milestone 3 — Core Queries
Now the payoff: answering real business questions. Both reports below JOIN tables back together and use an aggregate (SUM, COUNT) with GROUP BY to collapse many rows into one summary row per group.
Revenue per category
JOIN order_items to products, then SUM price × quantity per category.
-- MILESTONE 3a — Revenue per category
-- JOIN three tables, multiply price x quantity, then total it per category.
SELECT
p.category,
SUM(p.price * oi.quantity) AS revenue -- money earned per book line
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id -- attach the book's price + category
GROUP BY p.category -- one output row per category
ORDER BY revenue DESC;
-- ✅ Expected result:
-- Tech | 110.50 (39.00 + 32.50 + 39.00)
--
...Result — 2 rows, highest first:
| category | revenue |
|---|---|
| Tech | 110.5 |
| SciFi | 102 |
Top customers by spend
Walk all four tables to total each customer's lifetime spend.
-- MILESTONE 3b — Top customers by amount spent
-- Walk customers → orders → order_items → products to add up each spend.
SELECT
c.full_name,
COUNT(DISTINCT o.order_id) AS orders, -- how many orders they placed
SUM(p.price * oi.quantity) AS total_spent -- their lifetime spend
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY c.customer_id,
...Your Turn #1: average per customer
The joins are written for you. Fill in the two ___ blanks: the aggregate that means "average", and the column to group by. The expected shape is in the comments.
🎯 Your Turn: AVG per customer
Replace the ___ blanks to compute an average per customer.
-- 🎯 YOUR TURN #1 — average order value per customer
-- Goal: for each customer show their name and the AVERAGE money per order.
-- The joins are done for you. Fill in the two blanks.
SELECT
c.full_name,
___(p.price * oi.quantity) AS avg_per_line -- 👉 the aggregate that means "average"
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY c.customer_id
...Milestone 4 — Add Indexes for the Hot Queries
As data grows, the queries you run most often need to be fast. An index is a sorted lookup structure — like the index at the back of a book — that lets the database jump straight to matching rows instead of scanning every one. Index the columns you filter on (WHERE) and join on (ON).
EXPLAIN (or EXPLAIN QUERY PLAN in SQLite) shows you the database's plan for a query without running it. Seeing "USING INDEX" instead of "SCAN" tells you the index is doing its job.
INSERT/UPDATE must update the index too). Add them for real hot paths, not "just in case".Index the join/filter columns
Create four indexes, then inspect a query plan with EXPLAIN.
-- MILESTONE 4 — Speed up the hot queries with indexes.
-- An index is a lookup structure (like a book's index) the database can scan
-- instead of reading every row. Index the columns you FILTER or JOIN on.
-- We JOIN order_items to orders on order_id and to products on product_id a LOT:
CREATE INDEX idx_items_order ON order_items(order_id);
CREATE INDEX idx_items_product ON order_items(product_id);
-- We frequently look up orders by customer and filter by status:
CREATE INDEX idx_orders_cu
...Milestone 5 — A View and a Transaction
A view saves a SELECT under a name so you can reuse a complex report as if it were a simple table — it stores no data, just the query. A transaction bundles several writes so they either all succeed or all get undone; that all-or-nothing property is called atomicity.
Placing an order touches three tables (create the order, add the items, lower the stock). Without a transaction, a crash halfway through leaves you with an order that sold phantom stock. BEGIN … COMMIT makes those three changes one indivisible unit; ROLLBACK throws them all away.
Create a reusable view
Save the per-order total as order_totals, then query it like a table.
-- MILESTONE 5a — A VIEW: save a query under a name and reuse it.
-- A view is a stored SELECT. It holds no data of its own — it re-runs each
-- time you query it, so it's always up to date.
CREATE VIEW order_totals AS
SELECT
o.order_id,
o.customer_id,
o.status,
SUM(p.price * oi.quantity) AS order_total
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY o.order_id, o.customer_id, o.status;
-- Now query
...Result — delivered orders only:
| order_id | customer_id | status | order_total |
|---|---|---|---|
| 100 | 1 | delivered | 75 |
| 102 | 2 | delivered | 105 |
Place an order atomically
Insert order + item and decrement stock inside BEGIN … COMMIT.
-- MILESTONE 5b — A TRANSACTION: place an order atomically.
-- A transaction groups statements so they ALL succeed or ALL roll back.
-- "Atomic" = no half-finished state. Perfect for "take stock + record sale".
BEGIN TRANSACTION;
-- 1) Create the order header
INSERT INTO orders (order_id, customer_id, ordered_on, status)
VALUES (104, 3, '2025-04-10', 'pending');
-- 2) Add a line item: 2 copies of "Clean Code" (product 11)
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (104, 1
...Your Turn #2: undo with ROLLBACK
Fill in the two ___ blanks so the bad insert is thrown away and order 999 never exists. One blank starts the transaction, the other cancels it.
🎯 Your Turn: BEGIN / ROLLBACK
Start a transaction, then cancel it so nothing is saved.
-- 🎯 YOUR TURN #2 — undo a mistake with ROLLBACK
-- You start a transaction, realise the quantity is wrong, and want to abandon it.
-- Fill in the two blanks so NOTHING is saved.
___; -- 👉 the keyword that STARTS a transaction (two words)
INSERT INTO orders (order_id, customer_id, ordered_on, status)
VALUES (999, 1, '2025-04-11', 'pending');
-- Oops — wrong order. Throw the whole thing away:
___; -- 👉 the keyword that CANCELS everything since BEGIN
-- ✅ Expected result: order 999 does
...Milestone 6 — Advanced Touch: Best-Sellers per Category
Time to flex. A CTE (the WITH block) names an intermediate result so the final query reads cleanly. A window function like RANK() OVER (...) computes a value across a set of rows without collapsing them the way GROUP BY does — so you keep every book and get its rank.
PARTITION BY category restarts the ranking for each category, so every category gets its own #1, #2, #3 — exactly what a "best-sellers by section" report needs.
Rank best-sellers with a window function
A CTE feeds RANK() OVER (PARTITION BY category ...).
-- MILESTONE 6 — Advanced touch: rank best-sellers WITHIN each category.
-- A CTE (the WITH block) names a sub-result. A WINDOW FUNCTION (RANK() OVER...)
-- computes a value across a group of rows WITHOUT collapsing them with GROUP BY.
WITH sales AS ( -- CTE: revenue per book
SELECT
p.category,
p.title,
SUM(p.price * oi.quantity) AS revenue
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.cate
...Result — ranked within each category:
| category | title | revenue | rank_in_category |
|---|---|---|---|
| SciFi | Project Hail Mary | 66 | 1 |
| SciFi | Dune | 36 | 2 |
| Tech | The SQL Cookbook | 78 | 1 |
| Tech | Clean Code | 32.5 | 2 |
Common Pitfalls (and the fix)
- "FOREIGN KEY constraint failed" on INSERT: you inserted a child before its parent. Insert
customersandproductsfirst, thenorders, thenorder_items. - "no such table: products": you ran a query before Milestone 1/2. Run the schema and seed blocks first, in the same session.
- Wrong totals from a JOIN: joining
ordersstraight toproductswith noorder_itemsin between multiplies rows. Always route through the line-items table. - Column in SELECT but not in GROUP BY: every non-aggregated column you select must appear in
GROUP BY, or the query is ambiguous and errors. - Forgetting
COMMIT: changes insideBEGINaren't permanent until you commit. Close the session first and they vanish. - Index seems ignored: on tiny tables the planner may still choose a full scan because it's faster — that's correct. Indexes pay off as rows grow.
📘 Quick Reference — what this project used
| Syntax | Purpose |
|---|---|
| CREATE TABLE … PRIMARY KEY | Define a table and its unique-row identifier |
| REFERENCES other(col) | A foreign key linking tables |
| NOT NULL / UNIQUE / CHECK | Constraints that keep data valid |
| INSERT INTO … VALUES (…),(…) | Add multiple rows at once |
| JOIN … ON | Combine rows from related tables |
| GROUP BY + SUM/COUNT/AVG | Summarise rows into per-group totals |
| CREATE INDEX ON t(col) | Speed up filters and joins on a column |
| EXPLAIN QUERY PLAN | See how a query will run |
| CREATE VIEW … AS SELECT | Save a query under a reusable name |
| BEGIN … COMMIT / ROLLBACK | Run writes atomically (all or nothing) |
| WITH cte AS (…) | Name an intermediate result (CTE) |
| RANK() OVER (PARTITION BY …) | Rank rows within groups (window function) |
Frequently Asked Questions
Q: Why store line items in a separate order_items table?
Because an order can contain many books, and a book can appear in many orders — a many-to-many relationship. The line-items table is the bridge that makes that possible cleanly.
Q: Do I need indexes on such a tiny database?
Not for speed — with a handful of rows a full scan is instant. You add them here to learn the workflow; on a table with millions of rows they're the difference between milliseconds and minutes.
Q: Is a view slower than a real table?
A plain view re-runs its query each time, so it's exactly as fast as that query. If you need to cache the result, that's a materialised view — a different tool.
Q: What happens if a statement fails mid-transaction?
Nothing is saved until COMMIT. You run ROLLBACK (or the session ends) and the database returns to exactly how it was before BEGIN.
Q: My SQL playground rejected SERIAL or JSONB — why?
Those are PostgreSQL-specific. This project uses portable SQLite-friendly types (INTEGER, TEXT, REAL) so it runs nearly anywhere. Swap dialects if you target a specific engine.
🎯 Stretch Challenge: Add Product Reviews
No answer this time — just a brief and a comment outline. Extend BookNook with a reviews feature, wire its foreign keys, and write an aggregate report. This is the faded, build-it-yourself rung. Sketch it here, then run it in a playground.
🎯 Stretch Challenge
Design a reviews table + an average-rating report, from the outline only.
-- 🎯 STRETCH CHALLENGE — extend BookNook with product reviews.
-- No answer given. Design it yourself using everything above.
--
-- 1) Create a "reviews" table:
-- review_id INTEGER PRIMARY KEY
-- product_id INTEGER -> FK to products(product_id)
-- customer_id INTEGER -> FK to customers(customer_id)
-- rating INTEGER -> CHECK it's between 1 and 5
-- comment TEXT
-- created_on DATE NOT NULL
--
-- 2) Seed 4–5 reviews across a couple of books.
--
-- 3)
...🎉 Project Complete
- ✅ You designed a normalised schema with PKs, FKs, and constraints
- ✅ You seeded data in foreign-key-safe order
- ✅ You answered business questions with JOINs, aggregates, and GROUP BY
- ✅ You indexed hot columns and read an EXPLAIN plan
- ✅ You built a reusable view and placed an order inside a transaction
- ✅ You ranked best-sellers per category with a CTE + window function
- ✅ Where to go next: take the Stretch Challenge further — add authentication tables, write triggers to keep stock in sync, or load a real public dataset and rebuild these reports against it. You now have the full loop: design → seed → query → optimise → secure.
Sign up for free to track which lessons you've completed and get learning reminders.