Skip to main content
    Back

    Advanced Track • Analytical & BI SQL

    Analytical SQL for Business Intelligence

    By the end of this lesson you'll be able to turn a flat table into a real BI report: multi-level totals with subtotals and a grand total, percentiles and median, quartile buckets, running totals, and month-over-month growth — the exact toolkit behind every dashboard.

    What You'll Learn

    • Build multi-level reports with ROLLUP, CUBE and GROUPING SETS
    • Label subtotal & grand-total rows cleanly with GROUPING()
    • Compute median and percentiles with PERCENTILE_CONT / DISC
    • Bucket rows into quartiles and deciles with NTILE
    • Add running / cumulative totals with window functions
    • Measure period-over-period growth with LAG, and read cohort/funnel queries

    Our Sample Table: sales

    Every query in this lesson runs against this tiny sales table. It's deliberately small so you can add the numbers in your head and confirm each report by hand — that's the fastest way to trust what a subtotal or percentile is doing.

    Result:

    regionmonthamount
    NorthJan100
    NorthFeb200
    NorthMar300
    SouthJan400
    SouthFeb500
    SouthMar600

    Handy totals to memorise: North = 600, South = 1500, Grand total = 2100. By month: Jan 500, Feb 700, Mar 900.

    1. From GROUP BY to Multi-Level Reports (ROLLUP)

    A plain GROUP BY gives you exactly one level of summary — one row per region, say. A report, though, almost always wants more: a subtotal for each region and a grand total at the bottom. ROLLUP produces that whole hierarchy in a single query.

    📊 Real-world analogy

    ROLLUP is the Subtotal button in a spreadsheet. You group by region, and it inserts a subtotal line after each region plus one grand-total line at the very end — without you copying formulas around. GROUP BY ROLLUP (region, month) reads left to right: month rolls up into region, region rolls up into the grand total.

    Plain GROUP BY (one level)

    One total per region — the starting point.

    Try it Yourself »
    SQL
    -- A plain GROUP BY gives you ONE level: a total per region.
    SELECT region, SUM(amount) AS total
    FROM sales
    GROUP BY region
    ORDER BY region;
    
    -- That's fine, but a real report also wants a GRAND TOTAL row,
    -- and often per-month totals too. Doing that by hand means
    -- running three queries and gluing them together. There's a
    -- better way: GROUPING SETS, ROLLUP and CUBE (next sections).

    Result — 2 rows:

    regiontotal
    North600
    South1500

    Now add ROLLUP. You get the detail rows, a subtotal per region (where month is NULL), and one grand-total row (where both are NULL):

    ROLLUP (region, month)

    Detail rows + per-region subtotals + grand total.

    Try it Yourself »
    SQL
    -- ROLLUP builds a HIERARCHY of subtotals from left to right:
    --   (region, month)  -> every detail row
    --   (region)         -> a subtotal per region   (month is NULL)
    --   ()               -> one grand total          (both are NULL)
    SELECT region, month, SUM(amount) AS total
    FROM sales
    GROUP BY ROLLUP (region, month)
    ORDER BY region, month;
    
    -- The NULLs in the region/month columns are not missing data —
    -- they MARK the subtotal and grand-total rows. We'll label them
    -- properly with GROUPING
    ...

    Result — 9 rows — NULLs mark the subtotal and grand-total rows:

    regionmonthtotal
    NorthJan100
    NorthFeb200
    NorthMar300
    North(NULL)600
    SouthJan400
    SouthFeb500
    SouthMar600
    South(NULL)1500
    (NULL)(NULL)2100

    2. Reading Subtotal Rows with GROUPING()

    Those NULLs are the part beginners misread. A NULL in a ROLLUP result does not mean "missing data" — it means "this column was rolled up; this row is a subtotal". The GROUPING(col) function makes that explicit: it returns 1 for a rolled-up (subtotal) column and 0 for a normal value, so you can swap in a readable label.

    GROUPING() for readable labels

    Turn NULL subtotal markers into 'All regions' / 'All months'.

    Try it Yourself »
    SQL
    -- GROUPING(col) returns 1 when this row is a subtotal for that
    -- column (the value was "rolled up"), and 0 for a normal value.
    -- Use it to print readable labels instead of bare NULLs.
    SELECT
        CASE WHEN GROUPING(region) = 1 THEN 'All regions' ELSE region END AS region,
        CASE WHEN GROUPING(month)  = 1 THEN 'All months'  ELSE month  END AS month,
        SUM(amount) AS total
    FROM sales
    GROUP BY ROLLUP (region, month)
    ORDER BY GROUPING(region), region, GROUPING(month), month;
    
    -- ORDER BY GROUP
    ...

    Result — same 9 rows, now labelled:

    regionmonthtotal
    NorthJan100
    NorthFeb200
    NorthMar300
    NorthAll months600
    SouthJan400
    SouthFeb500
    SouthMar600
    SouthAll months1500
    All regionsAll months2100

    Your Turn: build a ROLLUP report

    Fill in the blanks to produce a per-region total plus a grand-total row. The expected result is in the comments so you can check yourself.

    🎯 Your Turn: region subtotals + grand total

    Replace the ___ blanks with ROLLUP and GROUPING.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the three blanks, then press "Try it Yourself"
    -- Goal: a report with a total per region AND a grand total row.
    
    SELECT region, SUM(amount) AS total
    FROM sales
    GROUP BY ___ (region)        -- 👉 keyword that adds the grand-total level
    ORDER BY ___ (region),       -- 👉 puts the subtotal rows in order...
             region;             -- 👉 ...then region (NULL grand total sorts last)
    
    -- Hint: replace the first ___ with ROLLUP, the second with GROUPING.
    -- ✅ Expected resu
    ...

    3. Pick Your Totals: GROUPING SETS & CUBE

    ROLLUP always rolls up left to right. Sometimes you want totals that aren't a strict hierarchy — a region summary and a month summary side by side, with no detail rows. GROUPING SETS lets you list precisely the combinations you want; the empty set () is the grand total.

    CUBE goes the other way and gives you every combination at once — detail, per-region, per-month, and grand total — which is ideal when a dashboard slices the same numbers in any direction.

    GROUPING SETS — choose exactly

    Region summary + month summary + grand total, no detail.

    Try it Yourself »
    SQL
    -- GROUPING SETS lets you pick EXACTLY which totals you want —
    -- no detail rows, just a region summary AND a month summary
    -- AND a grand total, all in one pass over the table.
    SELECT region, month, SUM(amount) AS total
    FROM sales
    GROUP BY GROUPING SETS ((region), (month), ())
    ORDER BY GROUPING(region), region, GROUPING(month), month;
    
    -- () is the empty grouping set — that's the grand total.
    -- ROLLUP(region, month) is just shorthand for the grouping sets
    -- (region, month), (region) and ().

    Result — 6 rows — two summaries plus the grand total:

    regionmonthtotal
    North(NULL)600
    South(NULL)1500
    (NULL)Jan500
    (NULL)Feb700
    (NULL)Mar900
    (NULL)(NULL)2100

    CUBE on the same two columns adds the detail rows back on top of those summaries — every cross-section of the data in one result:

    CUBE — every combination

    Detail + region subtotals + month subtotals + grand total.

    Try it Yourself »
    SQL
    -- CUBE gives EVERY combination: detail rows, a subtotal per
    -- region, a subtotal per month, AND the grand total. Use it when
    -- a dashboard needs to slice the same numbers any direction.
    SELECT region, month, SUM(amount) AS total
    FROM sales
    GROUP BY CUBE (region, month)
    ORDER BY GROUPING(region), region, GROUPING(month), month;
    
    -- CUBE(a, b) = grouping sets (a,b), (a), (b), ().
    -- It can explode in size: n columns produce 2^n grouping sets,
    -- so reach for ROLLUP unless you truly need every c
    ...

    Result — 13 rows — detail, both subtotal axes, and the grand total:

    regionmonthtotal
    NorthJan100
    NorthFeb200
    NorthMar300
    North(NULL)600
    SouthJan400
    SouthFeb500
    SouthMar600
    South(NULL)1500
    (NULL)Jan500
    (NULL)Feb700
    (NULL)Mar900
    (NULL)(NULL)2100

    (That's 12 of the 13 rows; the table above lists every grouping. CUBE of n columns produces 2^n grouping sets, so it grows fast — prefer ROLLUP unless you genuinely need every axis.)

    4. Percentiles & Median (PERCENTILE_CONT / DISC)

    An average is a single number that outliers can drag around; percentiles describe the whole spread. The median (the 50th percentile) is the middle value — half the data sits below it. In SQL these are ordered-set functions: you write PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount), where WITHIN GROUP names the column to rank.

    PERCENTILE_CONT ("continuous") interpolates between rows, so it can return a value that isn't in the data — the true statistical median. PERCENTILE_DISC ("discrete") returns an actual value from a row. With our six amounts the continuous median is 350 (halfway between 300 and 400) while the discrete median is 300.

    Mean vs. median vs. p90

    PERCENTILE_CONT and PERCENTILE_DISC with WITHIN GROUP.

    Try it Yourself »
    SQL
    -- Averages hide outliers; PERCENTILES describe the whole spread.
    -- WITHIN GROUP (ORDER BY ...) tells the function which column to
    -- rank. PERCENTILE_CONT interpolates between rows (a "true"
    -- median); PERCENTILE_DISC returns an actual value from the data.
    SELECT
        ROUND(AVG(amount), 2)                                            AS mean,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount)             AS median_cont,
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount)             AS 
    ...

    Result — 1 row:

    meanmedian_contmedian_discp90
    350350300550

    5. Buckets & Quartiles with NTILE

    NTILE(n) is a window function that sorts your rows and slices them into n equally-sized buckets, numbered 1..n. NTILE(4) gives quartiles, NTILE(10) deciles, NTILE(100) percentile bands — perfect for "which tier does this customer fall in?" segmentation.

    NTILE(2) — split into halves

    Bucket the six amounts into a lower and upper half.

    Try it Yourself »
    SQL
    -- NTILE(n) splits ordered rows into n equal-sized buckets.
    -- NTILE(4) = quartiles, NTILE(10) = deciles, NTILE(100) = percentiles.
    SELECT region, month, amount,
           NTILE(2) OVER (ORDER BY amount) AS half      -- 1 = lower half, 2 = upper half
    FROM sales
    ORDER BY amount;
    
    -- 6 rows sorted by amount split 3 + 3: the smallest three amounts
    -- (100,200,300) land in bucket 1, the largest three (400,500,600)
    -- in bucket 2. NTILE handles uneven counts by making the early
    -- buckets one row bigger
    ...

    Result — 6 rows — smallest three in bucket 1, largest three in bucket 2:

    regionmonthamounthalf
    NorthJan1001
    NorthFeb2001
    NorthMar3001
    SouthJan4002
    SouthFeb5002
    SouthMar6002

    Your Turn: buckets & median

    Two blanks: split the amounts into 3 buckets and show the overall median on every row. The expected result is in the comments.

    🎯 Your Turn: NTILE + median

    Replace the ___ blanks with NTILE and WITHIN.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself"
    -- Goal: tag each sale as 'Low', 'Mid' or 'High' by splitting the
    -- amounts into 3 equal buckets, and also show the median amount.
    
    SELECT region, month, amount,
           ___(3) OVER (ORDER BY amount) AS bucket,           -- 👉 the bucketing window function
           PERCENTILE_CONT(0.5) ___ GROUP (ORDER BY amount)   -- 👉 the clause percentiles need
               OVER () AS overall_median
    FROM sales
    ORDER BY amount;
    
    -- Hint: first
    ...

    6. Running & Cumulative Totals

    A running total accumulates a value from the first row down to the current one — month-to-date revenue, a year-to-date count, a balance. You get it from a window SUM with an ORDER BY and an explicit frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means "everything so far, including this row".

    Running total of monthly revenue

    Accumulate a clean monthly series with a window SUM.

    Try it Yourself »
    SQL
    -- RUNNING / CUMULATIVE totals: add up everything from the start of
    -- the window down to the current row. Build a clean monthly series
    -- first (a subquery), then accumulate it with a window SUM.
    SELECT month, revenue,
        SUM(revenue) OVER (ORDER BY month
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
            AS running_total
    FROM (
        SELECT month, SUM(amount) AS revenue
        FROM sales
        GROUP BY month
    ) monthly
    ORDER BY month;
    
    -- Monthly revenue is Jan 500, Feb 
    ...

    Result — 3 rows — the running total ends at the grand total:

    monthrevenuerunning_total
    Jan500500
    Feb7001200
    Mar9002100

    7. Period-over-Period Growth (LAG)

    To compare a period with the one before it, LAG(value, 1) reaches back one row in the ordered window. Subtract to get the change; divide by the previous value for a growth percentage. The first row has nothing before it, so LAG returns NULL there — and NULLIF(prev, 0) keeps your percentage from dividing by zero.

    Want year-over-year instead of month-over-month? Use LAG(revenue, 12) to reach back twelve rows. LEAD is the mirror image — it looks forward to the next period.

    Month-over-month change & %

    LAG to compare each month with the previous one.

    Try it Yourself »
    SQL
    -- PERIOD-OVER-PERIOD: compare each month with the one before it.
    -- LAG(x, 1) reaches back one row; the first row has no previous
    -- value, so it returns NULL (here we'd show '-').
    SELECT month, revenue,
        LAG(revenue, 1) OVER (ORDER BY month)                AS prev_month,
        revenue - LAG(revenue, 1) OVER (ORDER BY month)      AS change,
        ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
              / NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100, 1)
            AS pct_change
    FR
    ...

    Result — 3 rows — Jan has no previous month:

    monthrevenueprev_monthchangepct_change
    Jan500(NULL)(NULL)(NULL)
    Feb70050020040
    Mar90070020028.6

    8. Cohort & Funnel Queries (the big picture)

    Cohort and funnel analyses sound advanced, but they're built entirely from the tools you just learned — grouped aggregates, CASE WHEN tagging, window functions, and NULLIF for safe division. The query below shows the shape of each (against imagined event tables) so you can recognise the pattern when you meet it.

    • Funnel — one CASE per stage, summed, then the ratio between consecutive stages tells you where users drop off.
    • Cohort — group customers by their first month, then count how many are still active N months later; divide by the cohort's starting size (FIRST_VALUE) for a retention %.

    Funnel & cohort patterns

    The recognisable shape — built from this lesson's tools.

    Try it Yourself »
    SQL
    -- COHORT & FUNNEL queries reuse everything above — they are just
    -- grouped aggregates with safe division. Two patterns to recognise:
    
    -- 1) FUNNEL: one CASE per stage, summed, then ratios between stages.
    --    (Imagine an 'events' table; here is the SHAPE of the query.)
    SELECT
        COUNT(*)                                          AS sessions,
        SUM(CASE WHEN stage >= 1 THEN 1 ELSE 0 END)       AS viewed,
        SUM(CASE WHEN stage >= 2 THEN 1 ELSE 0 END)       AS added_to_cart,
        SUM(CASE WHE
    ...

    Common Errors (and the fix)

    • Misreading NULL subtotal rows: a NULL in a ROLLUP/CUBE result marks a subtotal, not missing data. Don't filter them with WHERE region IS NOT NULL — that deletes your totals. Label them with GROUPING(region) = 1 instead.
    • Mixing detail and aggregate columns: "column 'month' must appear in the GROUP BY clause or be used in an aggregate" — every non-aggregated column you SELECT must be in the grouping. With grouping sets, only the columns being grouped on that row have a real value.
    • Percentile syntax: PERCENTILE_CONT(0.5) alone errors — it needs WITHIN GROUP (ORDER BY col). Writing PERCENTILE_CONT(amount) (passing the column as the argument) is the classic mistake; the argument is the fraction 0..1, the column goes in WITHIN GROUP.
    • Running total without a frame: SUM(x) OVER (ORDER BY month) uses the default RANGE frame, which lumps tied rows together. For a precise row-by-row accumulation, spell out ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
    • Divide-by-zero in growth %: a missing or zero previous value throws division by zero. Wrap the denominator in NULLIF(prev, 0) so the result is NULL rather than an error.

    📘 Quick Reference

    SyntaxPurpose
    GROUP BY ROLLUP (a, b)Detail + subtotals per a + grand total
    GROUP BY CUBE (a, b)Every combination (2^n grouping sets)
    GROUP BY GROUPING SETS ((a),(b),())Exactly the totals you list
    GROUPING(col)1 if this row is a subtotal for col, else 0
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY x)Interpolated median / percentile
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY x)Median as an actual data value
    NTILE(4) OVER (ORDER BY x)Split rows into 4 equal buckets (quartiles)
    SUM(x) OVER (ORDER BY t ROWS UNBOUNDED PRECEDING)Running / cumulative total
    LAG(x, 1) OVER (ORDER BY t)Value from the previous period (12 = a year)
    NULLIF(prev, 0)Guard a denominator against divide-by-zero

    Frequently Asked Questions

    Q: When should I use ROLLUP vs. GROUPING SETS vs. CUBE?

    Use ROLLUP for a natural hierarchy (region > month > grand total). Use GROUPING SETS when you want a specific list of summaries that isn't a strict hierarchy. Use CUBE only when you truly need every cross-section — it can produce a lot of rows.

    Q: Why is my median a value that isn't in the table?

    That's PERCENTILE_CONT doing its job — it interpolates between the two middle rows. If you need a real row value, use PERCENTILE_DISC instead.

    Q: What's the difference between NTILE and a percentile?

    NTILE(n) labels each row with which of n equal-sized buckets it falls in. PERCENTILE_CONT returns the boundary value at a given fraction. Buckets vs. cut-points — related, but answering different questions.

    Q: My ROLLUP totals vanished after I added a WHERE clause — why?

    You probably filtered on region IS NOT NULL, which removes the subtotal/grand-total rows (their grouped columns are NULL by design). Use GROUPING(region) = 1 to identify totals, and keep filters on the raw data columns only.

    Mini-Challenge: Region Share Dashboard

    Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a Postgres playground to confirm.

    🎯 Mini-Challenge

    Per-region total + grand total + each region's % of sales.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE — a mini sales dashboard
    -- Using ONLY what this lesson covered, write ONE query that returns
    -- a per-region report with a grand-total row AND, for each region,
    -- its share of total sales as a percentage.
    --   1. GROUP BY ROLLUP (region)
    --   2. SUM(amount) AS total
    --   3. ROUND(SUM(amount) * 100.0 / SUM(SUM(amount)) OVER (), 1) AS pct_of_total
    --   4. Label the rolled-up row with GROUPING(region) (see section 2)
    --   5. ORDER BY GROUPING(region), region
    --
    -- ✅ Expected 
    ...

    🎉 Lesson Complete

    • ROLLUP, CUBE and GROUPING SETS build multi-level reports with subtotals in one query
    • GROUPING() tells a subtotal row apart from a missing value
    • PERCENTILE_CONT/DISC … WITHIN GROUP give you median and percentiles
    • NTILE(n) buckets rows into quartiles, deciles, or any tier
    • ✅ Window SUM makes running totals; LAG makes period-over-period growth
    • ✅ Cohort and funnel queries are just these tools combined with safe division
    • Next: Query Profiling — make these analytical queries fast at scale

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service