Skip to main content

    Advanced Track

    Advanced JOIN Patterns

    By the end of this lesson you'll reach beyond INNER and LEFT JOIN to answer the questions they can't: "who has an order?", "who has no order?", "what's each customer's biggest order?". You'll master semi-joins, anti-joins, self-joins, CROSS JOIN, and LATERAL — and the NULL trap that silently breaks NOT IN.

    What You'll Learn

    • Semi-joins: keep rows that HAVE a match (EXISTS / IN)
    • Anti-joins: keep rows with NO match (NOT EXISTS / NOT IN)
    • Why NOT IN + a NULL silently returns zero rows
    • Self-joins: compare a table against itself
    • CROSS JOIN and the Cartesian explosion to avoid
    • LATERAL / CROSS APPLY: a subquery per outer row

    Our Sample Tables: customers & orders

    Every query in this lesson runs against these two small tables. They share one link: orders.customer_id points back to customers.id. Notice Dan has no row in orders — that "missing" customer is the star of the anti-join section.

    Result — customers (5 rows):

    idname
    1Ava
    2Ben
    3Cara
    4Dan
    5Eve

    Result — orders (5 rows) — Dan (id 4) is absent:

    idcustomer_idamount
    101150
    102130
    103280
    104320
    105560

    So Ava (1) has two orders, Ben (2), Cara (3) and Eve (5) have one each, and Dan (4) has none.

    1. Semi-Joins — "Does a Match Exist?"

    A semi-join keeps rows from the left table that have at least one match on the right — but it never adds the right table's columns and never duplicates a row. You're asking "is this customer on the orders list?", not "show me every order".

    📋 Real-world analogy

    A semi-join is checking names off a guest list against the RSVP pile. You only care whether someone RSVP'd — not how many cards they sent. The guest appears once, ticked or not.

    Semi-join with EXISTS

    Customers who have placed an order — no duplicates.

    Try it Yourself »
    SQL
    -- SEMI-JOIN: keep rows in A that HAVE at least one match in B.
    -- "Which customers have placed at least one order?"
    
    -- EXISTS runs the inner query for each customer and stops
    -- at the FIRST matching order — it never multiplies rows.
    SELECT c.id, c.name
    FROM customers c
    WHERE EXISTS (
        SELECT 1                       -- the 1 is a placeholder; EXISTS
        FROM orders o                  -- only cares IF a row comes back,
        WHERE o.customer_id = c.id     -- not WHAT it contains
    );
    
    -- Result:
    ...

    Result — 4 rows — Dan excluded:

    idname
    1Ava
    2Ben
    3Cara
    5Eve

    The same answer reads a little shorter with IN, which checks whether each customer's id appears among the order rows:

    Semi-join with IN

    Same result, expressed with IN.

    Try it Yourself »
    SQL
    -- Same answer with IN: keep customers whose id appears
    -- in the list of customer_ids that have orders.
    SELECT id, name
    FROM customers
    WHERE id IN (SELECT customer_id FROM orders);
    
    -- IN builds the whole list first; EXISTS can short-circuit.
    -- On big tables EXISTS is usually the faster of the two.

    Result — 4 rows — identical to EXISTS:

    idname
    1Ava
    2Ben
    3Cara
    5Eve

    Your Turn: complete the semi-join

    Fill in the one blank with the keyword that means "a match exists". The expected result is in the comments so you can check yourself.

    🎯 Your Turn: customers WITH orders

    Add the keyword that keeps rows that have a match.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — complete the semi-join, then press "Try it Yourself".
    -- Goal: list every customer who HAS placed at least one order.
    
    SELECT c.id, c.name
    FROM customers c
    WHERE ___ (                        -- 👉 the keyword that means "a match exists"
        SELECT 1 FROM orders o
        WHERE o.customer_id = c.id
    );
    
    -- ✅ Expected: 4 rows — Ava, Ben, Cara, Eve (Dan has no orders).

    2. Anti-Joins — "Find What's Missing"

    An anti-join is the mirror image: keep the left-table rows that have no match on the right. This is how you answer "which customers have never ordered?", "which products never sold?", "which users never logged in?". The cleanest tool is WHERE NOT EXISTS (...).

    Back to the guest list: an anti-join finds everyone who didn't RSVP — the names with an empty spot in the reply pile. In our data that's exactly one person: Dan.

    Anti-join with NOT EXISTS

    Customers who have never ordered.

    Try it Yourself »
    SQL
    -- ANTI-JOIN: keep rows in A that have NO match in B.
    -- "Which customers have NEVER placed an order?"
    
    -- NOT EXISTS is the safe, fast, NULL-proof way to do this.
    SELECT c.id, c.name
    FROM customers c
    WHERE NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.id
    );
    
    -- Result: just Dan (id 4) — the only customer with zero orders.

    Result — 1 row:

    idname
    4Dan

    You can write it with NOT IN too — but only if you guard against NULLs in the subquery:

    Anti-join with NOT IN (guarded)

    Same result — note the IS NOT NULL guard.

    Try it Yourself »
    SQL
    -- The SAME idea with NOT IN — but watch the NULL trap below.
    SELECT id, name
    FROM customers
    WHERE id NOT IN (
        SELECT customer_id
        FROM orders
        WHERE customer_id IS NOT NULL   -- ⚠️ CRITICAL guard, see why next
    );
    
    -- Result: Dan (id 4). Identical to NOT EXISTS — WHEN the guard is present.

    Result — 1 row — same as NOT EXISTS:

    idname
    4Dan

    ⚠️ Why NOT IN + NULL returns nothing

    The silent bug that empties your result.

    Try it Yourself »
    SQL
    -- ⚠️ THE NULL TRAP — why NOT EXISTS is the safer default.
    -- Imagine ONE order row has a NULL customer_id (an unassigned order):
    
    SELECT id, name
    FROM customers
    WHERE id NOT IN (SELECT customer_id FROM orders);  -- no NULL guard!
    
    -- Result: 0 rows. Nothing. Empty.
    --
    -- Why? "id NOT IN (1, 2, 3, NULL)" becomes
    -- "id <> 1 AND id <> 2 AND id <> 3 AND id <> NULL".
    -- "id <> NULL" is UNKNOWN (never TRUE), so the whole row is rejected.
    -- One stray NULL silently wipes out every result.
    -- NOT EXIS
    ...

    Result — 0 rows (!) — a single NULL wiped the result:

    idname
    (empty)(empty)

    Your Turn: complete the anti-join

    Fill in the blank with the two words that mean "no match exists", so the query returns customers with no orders.

    🎯 Your Turn: customers with NO orders

    Use NOT EXISTS to find the unmatched customer.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — complete the anti-join, then press "Try it Yourself".
    -- Goal: find every customer who has NO orders at all.
    
    SELECT c.id, c.name
    FROM customers c
    WHERE ___ EXISTS (                 -- 👉 the two words that mean "no match exists"
        SELECT 1 FROM orders o
        WHERE o.customer_id = c.id
    );
    
    -- ✅ Expected: 1 row — Dan (id 4), the customer with zero orders.

    3. Self-Joins — A Table Meets Itself

    A self-join joins a table to a second copy of itself using two aliases. It's how you compare rows within one table — find pairs, build employee→manager hierarchies, or compare consecutive rows. Here we'll find customers who placed more than one order by pairing each order with another order from the same customer.

    Self-join: customers with 2+ orders

    Pair each order with another from the same customer.

    Try it Yourself »
    SQL
    -- SELF-JOIN: join a table to ITSELF using two aliases.
    -- "Which customers placed more than one order, and which pair?"
    
    -- We give 'orders' two names (o1, o2) so SQL treats it as two tables.
    SELECT o1.customer_id,
           o1.id AS first_order,
           o2.id AS second_order
    FROM orders o1
    JOIN orders o2
        ON o1.customer_id = o2.customer_id   -- same customer...
        AND o1.id < o2.id;                   -- ...but two DIFFERENT orders
    
    -- The "o1.id < o2.id" does two jobs:
    --   • removes a row pai
    ...

    Result — 1 row — only Ava (customer 1) has two orders:

    customer_idfirst_ordersecond_order
    1101102

    4. CROSS JOIN — Cartesian Products

    A CROSS JOIN pairs every row of one table with every row of another — no ON clause. The output size is the two row counts multiplied. It's invaluable for generating every combination (sizes × colours, dates × products), but a missing join condition silently becomes one and explodes your result.

    5 customers × 5 orders = 25 rows. Harmless here — but 1,000 × 1,000 = 1,000,000 rows. Always know why you're cross-joining.

    CROSS JOIN — every combination

    Pair every customer with every order (5 × 5).

    Try it Yourself »
    SQL
    -- CROSS JOIN: pair EVERY row of A with EVERY row of B.
    -- Also called a Cartesian product. There is no ON clause.
    SELECT c.name, o.id AS order_id
    FROM customers c
    CROSS JOIN orders o;
    
    -- 5 customers × 5 orders = 25 rows (every name beside every order).
    -- ⚠️ This grows fast: 1,000 × 1,000 = 1,000,000 rows.
    -- Use CROSS JOIN deliberately — to build combinations or fill gaps,
    -- never by accident (a missing JOIN condition becomes a CROSS JOIN).

    Result — 25 rows total (5 × 5) — first few shown:

    nameorder_id
    Ava101
    Ava102
    Ava103
    Ava104
    Ava105
    Ben101

    5. LATERAL / CROSS APPLY — A Subquery Per Row

    A normal subquery in the FROM clause can't see the outer row. LATERAL removes that wall: it runs the subquery once for each outer row and lets it reference that row's columns — like a for-each loop. It's the cleanest way to get "the top N rows per group", here each customer's single biggest order.

    LATERAL — biggest order per customer

    Run a per-customer subquery and keep the top row.

    Try it Yourself »
    SQL
    -- LATERAL: run a subquery FOR EACH row of the outer table,
    -- and let that subquery SEE the outer row's columns.
    -- "Get each customer's single biggest order."
    
    SELECT c.name, top_order.amount
    FROM customers c
    CROSS JOIN LATERAL (
        SELECT o.amount
        FROM orders o
        WHERE o.customer_id = c.id     -- ← references the OUTER c.id (that's LATERAL)
        ORDER BY o.amount DESC
        LIMIT 1
    ) AS top_order;
    
    -- CROSS JOIN LATERAL drops customers whose subquery is empty,
    -- so Dan (no orders) does N
    ...

    Result — 4 rows — Dan dropped (no orders; CROSS JOIN LATERAL):

    nameamount
    Ava50
    Ben80
    Cara20
    Eve60

    Common Errors (and the fix)

    • NOT IN returns nothing: the subquery contained a NULL, which makes every comparison UNKNOWN and rejects all rows. Switch to NOT EXISTS, or add WHERE col IS NOT NULL inside the subquery.
    • Result has way too many rows: you wrote a CROSS JOIN, or forgot the ON condition on a regular join — which becomes a Cartesian product. Add the join condition that links the two tables.
    • Customers appear multiple times: you used JOIN orders when you only needed to know a match exists. A one-to-many join repeats the left row per match — use WHERE EXISTS (...) instead of joining and de-duplicating with DISTINCT.
    • "missing FROM-clause entry for table o" / unknown alias: in a self-join you must give each copy its own alias (orders o1, orders o2) and qualify every column (o1.id), or SQL can't tell the copies apart.
    • LATERAL subquery "cannot reference c.id": a plain FROM (subquery) can't see the outer row. Add the LATERAL keyword (or use CROSS APPLY on SQL Server) to allow the reference.

    📘 Quick Reference

    PatternSyntaxUse it to…
    Semi-joinWHERE EXISTS (SELECT 1 ...)Keep rows that HAVE a match (no dupes)
    Semi-joinWHERE id IN (SELECT ...)Same idea, list-membership style
    Anti-joinWHERE NOT EXISTS (SELECT 1 ...)Keep rows with NO match (NULL-safe)
    Anti-joinWHERE id NOT IN (SELECT ...)NO match — but guard against NULLs
    Cross joinA CROSS JOIN BEvery combination (rows multiply)
    LateralCROSS JOIN LATERAL (...)A subquery per outer row (top-N)
    Lateral (T-SQL)CROSS APPLY (...)SQL Server / Oracle equivalent

    Frequently Asked Questions

    Q: Why does SELECT 1 appear inside EXISTS?

    EXISTS only checks whether a row comes back, not what's in it, so the select list is irrelevant. 1 is a conventional placeholder; SELECT * would behave identically.

    Q: Should I always prefer EXISTS over IN?

    For correctness with NOT IN, yes — NOT EXISTS avoids the NULL trap. For positive checks, modern optimisers often run IN and EXISTS the same way; EXISTS tends to win on large or correlated subqueries.

    Q: When would I use a self-join instead of a window function?

    Self-joins shine for pairing rows or walking a hierarchy. For "rank within a group" or "compare to the previous row", window functions (the next lesson) are usually clearer and faster.

    Q: Does my database support LATERAL?

    PostgreSQL and MySQL 8+ use LATERAL; SQL Server and Oracle use CROSS APPLY / OUTER APPLY. SQLite has neither — there you'd fall back to a correlated subquery or a window function.

    Mini-Challenge: High-Value Customers

    Support is faded now — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a playground to confirm.

    🎯 Mini-Challenge

    Semi-join: customers with an order over 40, each once.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE — a semi-join with a twist.
    -- Using ONLY this lesson's ideas (EXISTS and a condition in the subquery):
    --   List every customer who has placed an order worth MORE THAN 40.
    --   Each customer should appear at most once.
    --
    -- ✅ Expected: 3 rows — Ava (order 50), Ben (order 80), Eve (order 60).
    --    Cara is excluded (her only order is 20). Dan is excluded (no orders).
    
    -- your query here

    Result — expected — 3 rows:

    idname
    1Ava
    2Ben
    5Eve

    🎉 Lesson Complete

    • Semi-join (EXISTS / IN) keeps rows that have a match, without duplicating them
    • Anti-join (NOT EXISTS / NOT IN) keeps rows with no match — find what's missing
    • NOT IN + a single NULL returns zero rows; reach for NOT EXISTS instead
    • Self-joins compare a table to itself; o1.id < o2.id kills mirror duplicates
    • CROSS JOIN multiplies rows — powerful for combinations, dangerous by accident
    • LATERAL / CROSS APPLY runs a subquery per outer row for clean top-N queries
    • Next: Window Functions — rank, total, and compare rows without collapsing them

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service