Advanced Track
Advanced JOIN Patterns
By the end of this lesson you'll reach beyond INNER and LEFT JOIN to answer the questions they can't: "who has an order?", "who has no order?", "what's each customer's biggest order?". You'll master semi-joins, anti-joins, self-joins, CROSS JOIN, and LATERAL — and the NULL trap that silently breaks NOT IN.
What You'll Learn
- ✓Semi-joins: keep rows that HAVE a match (EXISTS / IN)
- ✓Anti-joins: keep rows with NO match (NOT EXISTS / NOT IN)
- ✓Why NOT IN + a NULL silently returns zero rows
- ✓Self-joins: compare a table against itself
- ✓CROSS JOIN and the Cartesian explosion to avoid
- ✓LATERAL / CROSS APPLY: a subquery per outer row
Our Sample Tables: customers & orders
Every query in this lesson runs against these two small tables. They share one link: orders.customer_id points back to customers.id. Notice Dan has no row in orders — that "missing" customer is the star of the anti-join section.
Result — customers (5 rows):
| id | name |
|---|---|
| 1 | Ava |
| 2 | Ben |
| 3 | Cara |
| 4 | Dan |
| 5 | Eve |
Result — orders (5 rows) — Dan (id 4) is absent:
| id | customer_id | amount |
|---|---|---|
| 101 | 1 | 50 |
| 102 | 1 | 30 |
| 103 | 2 | 80 |
| 104 | 3 | 20 |
| 105 | 5 | 60 |
So Ava (1) has two orders, Ben (2), Cara (3) and Eve (5) have one each, and Dan (4) has none.
1. Semi-Joins — "Does a Match Exist?"
A semi-join keeps rows from the left table that have at least one match on the right — but it never adds the right table's columns and never duplicates a row. You're asking "is this customer on the orders list?", not "show me every order".
📋 Real-world analogy
A semi-join is checking names off a guest list against the RSVP pile. You only care whether someone RSVP'd — not how many cards they sent. The guest appears once, ticked or not.
EXISTS stops at the first matching row (a short-circuit), while IN may build the entire list first. For large subqueries, EXISTS is almost always the faster choice.Semi-join with EXISTS
Customers who have placed an order — no duplicates.
-- SEMI-JOIN: keep rows in A that HAVE at least one match in B.
-- "Which customers have placed at least one order?"
-- EXISTS runs the inner query for each customer and stops
-- at the FIRST matching order — it never multiplies rows.
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1 -- the 1 is a placeholder; EXISTS
FROM orders o -- only cares IF a row comes back,
WHERE o.customer_id = c.id -- not WHAT it contains
);
-- Result:
...Result — 4 rows — Dan excluded:
| id | name |
|---|---|
| 1 | Ava |
| 2 | Ben |
| 3 | Cara |
| 5 | Eve |
The same answer reads a little shorter with IN, which checks whether each customer's id appears among the order rows:
Semi-join with IN
Same result, expressed with IN.
-- Same answer with IN: keep customers whose id appears
-- in the list of customer_ids that have orders.
SELECT id, name
FROM customers
WHERE id IN (SELECT customer_id FROM orders);
-- IN builds the whole list first; EXISTS can short-circuit.
-- On big tables EXISTS is usually the faster of the two.Result — 4 rows — identical to EXISTS:
| id | name |
|---|---|
| 1 | Ava |
| 2 | Ben |
| 3 | Cara |
| 5 | Eve |
Your Turn: complete the semi-join
Fill in the one blank with the keyword that means "a match exists". The expected result is in the comments so you can check yourself.
🎯 Your Turn: customers WITH orders
Add the keyword that keeps rows that have a match.
-- 🎯 YOUR TURN — complete the semi-join, then press "Try it Yourself".
-- Goal: list every customer who HAS placed at least one order.
SELECT c.id, c.name
FROM customers c
WHERE ___ ( -- 👉 the keyword that means "a match exists"
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
);
-- ✅ Expected: 4 rows — Ava, Ben, Cara, Eve (Dan has no orders).2. Anti-Joins — "Find What's Missing"
An anti-join is the mirror image: keep the left-table rows that have no match on the right. This is how you answer "which customers have never ordered?", "which products never sold?", "which users never logged in?". The cleanest tool is WHERE NOT EXISTS (...).
Back to the guest list: an anti-join finds everyone who didn't RSVP — the names with an empty spot in the reply pile. In our data that's exactly one person: Dan.
Anti-join with NOT EXISTS
Customers who have never ordered.
-- ANTI-JOIN: keep rows in A that have NO match in B.
-- "Which customers have NEVER placed an order?"
-- NOT EXISTS is the safe, fast, NULL-proof way to do this.
SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
);
-- Result: just Dan (id 4) — the only customer with zero orders.Result — 1 row:
| id | name |
|---|---|
| 4 | Dan |
You can write it with NOT IN too — but only if you guard against NULLs in the subquery:
Anti-join with NOT IN (guarded)
Same result — note the IS NOT NULL guard.
-- The SAME idea with NOT IN — but watch the NULL trap below.
SELECT id, name
FROM customers
WHERE id NOT IN (
SELECT customer_id
FROM orders
WHERE customer_id IS NOT NULL -- ⚠️ CRITICAL guard, see why next
);
-- Result: Dan (id 4). Identical to NOT EXISTS — WHEN the guard is present.Result — 1 row — same as NOT EXISTS:
| id | name |
|---|---|
| 4 | Dan |
NOT IN subquery returns even one NULL, the entire result becomes empty. The example below shows exactly why — and why NOT EXISTS is the safer default.⚠️ Why NOT IN + NULL returns nothing
The silent bug that empties your result.
-- ⚠️ THE NULL TRAP — why NOT EXISTS is the safer default.
-- Imagine ONE order row has a NULL customer_id (an unassigned order):
SELECT id, name
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders); -- no NULL guard!
-- Result: 0 rows. Nothing. Empty.
--
-- Why? "id NOT IN (1, 2, 3, NULL)" becomes
-- "id <> 1 AND id <> 2 AND id <> 3 AND id <> NULL".
-- "id <> NULL" is UNKNOWN (never TRUE), so the whole row is rejected.
-- One stray NULL silently wipes out every result.
-- NOT EXIS
...Result — 0 rows (!) — a single NULL wiped the result:
| id | name |
|---|---|
| (empty) | (empty) |
Your Turn: complete the anti-join
Fill in the blank with the two words that mean "no match exists", so the query returns customers with no orders.
🎯 Your Turn: customers with NO orders
Use NOT EXISTS to find the unmatched customer.
-- 🎯 YOUR TURN — complete the anti-join, then press "Try it Yourself".
-- Goal: find every customer who has NO orders at all.
SELECT c.id, c.name
FROM customers c
WHERE ___ EXISTS ( -- 👉 the two words that mean "no match exists"
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
);
-- ✅ Expected: 1 row — Dan (id 4), the customer with zero orders.3. Self-Joins — A Table Meets Itself
A self-join joins a table to a second copy of itself using two aliases. It's how you compare rows within one table — find pairs, build employee→manager hierarchies, or compare consecutive rows. Here we'll find customers who placed more than one order by pairing each order with another order from the same customer.
o1.id < o2.id trick: it stops a row pairing with itself, and keeps only one of each mirrored pair — you get (101, 102) but not the redundant (102, 101).Self-join: customers with 2+ orders
Pair each order with another from the same customer.
-- SELF-JOIN: join a table to ITSELF using two aliases.
-- "Which customers placed more than one order, and which pair?"
-- We give 'orders' two names (o1, o2) so SQL treats it as two tables.
SELECT o1.customer_id,
o1.id AS first_order,
o2.id AS second_order
FROM orders o1
JOIN orders o2
ON o1.customer_id = o2.customer_id -- same customer...
AND o1.id < o2.id; -- ...but two DIFFERENT orders
-- The "o1.id < o2.id" does two jobs:
-- • removes a row pai
...Result — 1 row — only Ava (customer 1) has two orders:
| customer_id | first_order | second_order |
|---|---|---|
| 1 | 101 | 102 |
4. CROSS JOIN — Cartesian Products
A CROSS JOIN pairs every row of one table with every row of another — no ON clause. The output size is the two row counts multiplied. It's invaluable for generating every combination (sizes × colours, dates × products), but a missing join condition silently becomes one and explodes your result.
5 customers × 5 orders = 25 rows. Harmless here — but 1,000 × 1,000 = 1,000,000 rows. Always know why you're cross-joining.
CROSS JOIN — every combination
Pair every customer with every order (5 × 5).
-- CROSS JOIN: pair EVERY row of A with EVERY row of B.
-- Also called a Cartesian product. There is no ON clause.
SELECT c.name, o.id AS order_id
FROM customers c
CROSS JOIN orders o;
-- 5 customers × 5 orders = 25 rows (every name beside every order).
-- ⚠️ This grows fast: 1,000 × 1,000 = 1,000,000 rows.
-- Use CROSS JOIN deliberately — to build combinations or fill gaps,
-- never by accident (a missing JOIN condition becomes a CROSS JOIN).Result — 25 rows total (5 × 5) — first few shown:
| name | order_id |
|---|---|
| Ava | 101 |
| Ava | 102 |
| Ava | 103 |
| Ava | 104 |
| Ava | 105 |
| Ben | 101 |
| … | … |
5. LATERAL / CROSS APPLY — A Subquery Per Row
A normal subquery in the FROM clause can't see the outer row. LATERAL removes that wall: it runs the subquery once for each outer row and lets it reference that row's columns — like a for-each loop. It's the cleanest way to get "the top N rows per group", here each customer's single biggest order.
CROSS JOIN LATERAL (...); SQL Server and Oracle write CROSS APPLY (...). Use the OUTER variants (LEFT JOIN LATERAL ... ON true / OUTER APPLY) to keep rows whose subquery is empty.LATERAL — biggest order per customer
Run a per-customer subquery and keep the top row.
-- LATERAL: run a subquery FOR EACH row of the outer table,
-- and let that subquery SEE the outer row's columns.
-- "Get each customer's single biggest order."
SELECT c.name, top_order.amount
FROM customers c
CROSS JOIN LATERAL (
SELECT o.amount
FROM orders o
WHERE o.customer_id = c.id -- ← references the OUTER c.id (that's LATERAL)
ORDER BY o.amount DESC
LIMIT 1
) AS top_order;
-- CROSS JOIN LATERAL drops customers whose subquery is empty,
-- so Dan (no orders) does N
...Result — 4 rows — Dan dropped (no orders; CROSS JOIN LATERAL):
| name | amount |
|---|---|
| Ava | 50 |
| Ben | 80 |
| Cara | 20 |
| Eve | 60 |
Common Errors (and the fix)
NOT INreturns nothing: the subquery contained aNULL, which makes every comparisonUNKNOWNand rejects all rows. Switch toNOT EXISTS, or addWHERE col IS NOT NULLinside the subquery.- Result has way too many rows: you wrote a
CROSS JOIN, or forgot theONcondition on a regular join — which becomes a Cartesian product. Add the join condition that links the two tables. - Customers appear multiple times: you used
JOIN orderswhen you only needed to know a match exists. A one-to-many join repeats the left row per match — useWHERE EXISTS (...)instead of joining and de-duplicating withDISTINCT. - "missing FROM-clause entry for table o" / unknown alias: in a self-join you must give each copy its own alias (
orders o1,orders o2) and qualify every column (o1.id), or SQL can't tell the copies apart. - LATERAL subquery "cannot reference c.id": a plain
FROM (subquery)can't see the outer row. Add theLATERALkeyword (or useCROSS APPLYon SQL Server) to allow the reference.
📘 Quick Reference
| Pattern | Syntax | Use it to… |
|---|---|---|
| Semi-join | WHERE EXISTS (SELECT 1 ...) | Keep rows that HAVE a match (no dupes) |
| Semi-join | WHERE id IN (SELECT ...) | Same idea, list-membership style |
| Anti-join | WHERE NOT EXISTS (SELECT 1 ...) | Keep rows with NO match (NULL-safe) |
| Anti-join | WHERE id NOT IN (SELECT ...) | NO match — but guard against NULLs |
| Cross join | A CROSS JOIN B | Every combination (rows multiply) |
| Lateral | CROSS JOIN LATERAL (...) | A subquery per outer row (top-N) |
| Lateral (T-SQL) | CROSS APPLY (...) | SQL Server / Oracle equivalent |
Frequently Asked Questions
Q: Why does SELECT 1 appear inside EXISTS?
EXISTS only checks whether a row comes back, not what's in it, so the select list is irrelevant. 1 is a conventional placeholder; SELECT * would behave identically.
Q: Should I always prefer EXISTS over IN?
For correctness with NOT IN, yes — NOT EXISTS avoids the NULL trap. For positive checks, modern optimisers often run IN and EXISTS the same way; EXISTS tends to win on large or correlated subqueries.
Q: When would I use a self-join instead of a window function?
Self-joins shine for pairing rows or walking a hierarchy. For "rank within a group" or "compare to the previous row", window functions (the next lesson) are usually clearer and faster.
Q: Does my database support LATERAL?
PostgreSQL and MySQL 8+ use LATERAL; SQL Server and Oracle use CROSS APPLY / OUTER APPLY. SQLite has neither — there you'd fall back to a correlated subquery or a window function.
Mini-Challenge: High-Value Customers
Support is faded now — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a playground to confirm.
🎯 Mini-Challenge
Semi-join: customers with an order over 40, each once.
-- 🎯 MINI-CHALLENGE — a semi-join with a twist.
-- Using ONLY this lesson's ideas (EXISTS and a condition in the subquery):
-- List every customer who has placed an order worth MORE THAN 40.
-- Each customer should appear at most once.
--
-- ✅ Expected: 3 rows — Ava (order 50), Ben (order 80), Eve (order 60).
-- Cara is excluded (her only order is 20). Dan is excluded (no orders).
-- your query hereResult — expected — 3 rows:
| id | name |
|---|---|
| 1 | Ava |
| 2 | Ben |
| 5 | Eve |
🎉 Lesson Complete
- ✅ Semi-join (
EXISTS/IN) keeps rows that have a match, without duplicating them - ✅ Anti-join (
NOT EXISTS/NOT IN) keeps rows with no match — find what's missing - ✅
NOT IN+ a singleNULLreturns zero rows; reach forNOT EXISTSinstead - ✅ Self-joins compare a table to itself;
o1.id < o2.idkills mirror duplicates - ✅ CROSS JOIN multiplies rows — powerful for combinations, dangerous by accident
- ✅ LATERAL /
CROSS APPLYruns a subquery per outer row for clean top-N queries - ✅ Next: Window Functions — rank, total, and compare rows without collapsing them
Sign up for free to track which lessons you've completed and get learning reminders.