Advanced Track • Analytical & BI SQL

Analytical SQL for Business Intelligence

By the end of this lesson you'll be able to turn a flat table into a real BI report: multi-level totals with subtotals and a grand total, percentiles and median, quartile buckets, running totals, and month-over-month growth — the exact toolkit behind every dashboard.

What You'll Learn

✓Build multi-level reports with ROLLUP, CUBE and GROUPING SETS
✓Label subtotal & grand-total rows cleanly with GROUPING()
✓Compute median and percentiles with PERCENTILE_CONT / DISC
✓Bucket rows into quartiles and deciles with NTILE
✓Add running / cumulative totals with window functions
✓Measure period-over-period growth with LAG, and read cohort/funnel queries

Before you start: you should be comfortable with GROUP BY & aggregates and window functions — this lesson stacks new tools on top of both. The in-browser editor lets you write and edit SQL; to run it, copy your query into a free playground like db-fiddle.com (pick PostgreSQL — it supports GROUPING SETS, CUBE and PERCENTILE_CONT). Every example below shows the expected result so you can check yourself.

Dialect note: GROUPING SETS, ROLLUP, CUBE and PERCENTILE_CONT … WITHIN GROUP are standard SQL and work in PostgreSQL, SQL Server, Oracle and Snowflake. SQLite and older MySQL don't support all of them, so use a Postgres playground for this lesson.

Our Sample Table: `sales`

Every query in this lesson runs against this tiny sales table. It's deliberately small so you can add the numbers in your head and confirm each report by hand — that's the fastest way to trust what a subtotal or percentile is doing.

Result:

region	month	amount
North	Jan	100
North	Feb	200
North	Mar	300
South	Jan	400
South	Feb	500
South	Mar	600

Handy totals to memorise: North = 600, South = 1500, Grand total = 2100. By month: Jan 500, Feb 700, Mar 900.

1. From GROUP BY to Multi-Level Reports (ROLLUP)

A plain GROUP BY gives you exactly one level of summary — one row per region, say. A report, though, almost always wants more: a subtotal for each region and a grand total at the bottom. ROLLUP produces that whole hierarchy in a single query.

📊 Real-world analogy

ROLLUP is the Subtotal button in a spreadsheet. You group by region, and it inserts a subtotal line after each region plus one grand-total line at the very end — without you copying formulas around. GROUP BY ROLLUP (region, month) reads left to right: month rolls up into region, region rolls up into the grand total.

Plain GROUP BY (one level)

One total per region — the starting point.

Try it Yourself »

SQL

-- A plain GROUP BY gives you ONE level: a total per region.
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
ORDER BY region;

-- That's fine, but a real report also wants a GRAND TOTAL row,
-- and often per-month totals too. Doing that by hand means
-- running three queries and gluing them together. There's a
-- better way: GROUPING SETS, ROLLUP and CUBE (next sections).

Result — 2 rows:

region	total
North	600
South	1500

Now add ROLLUP. You get the detail rows, a subtotal per region (where month is NULL), and one grand-total row (where both are NULL):

ROLLUP (region, month)

Detail rows + per-region subtotals + grand total.

Try it Yourself »

SQL

-- ROLLUP builds a HIERARCHY of subtotals from left to right:
--   (region, month)  -> every detail row
--   (region)         -> a subtotal per region   (month is NULL)
--   ()               -> one grand total          (both are NULL)
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY ROLLUP (region, month)
ORDER BY region, month;

-- The NULLs in the region/month columns are not missing data —
-- they MARK the subtotal and grand-total rows. We'll label them
-- properly with GROUPING
...

Result — 9 rows — NULLs mark the subtotal and grand-total rows:

region	month	total
North	Jan	100
North	Feb	200
North	Mar	300
North	(NULL)	600
South	Jan	400
South	Feb	500
South	Mar	600
South	(NULL)	1500
(NULL)	(NULL)	2100

2. Reading Subtotal Rows with GROUPING()

Those NULLs are the part beginners misread. A NULL in a ROLLUP result does not mean "missing data" — it means "this column was rolled up; this row is a subtotal". The GROUPING(col) function makes that explicit: it returns 1 for a rolled-up (subtotal) column and 0 for a normal value, so you can swap in a readable label.

Sorting tip: put GROUPING(region), GROUPING(month) in your ORDER BY so each subtotal sits directly beneath the rows it sums and the grand total lands last — otherwise the NULL rows scatter unpredictably.

GROUPING() for readable labels

Turn NULL subtotal markers into 'All regions' / 'All months'.

Try it Yourself »

SQL

-- GROUPING(col) returns 1 when this row is a subtotal for that
-- column (the value was "rolled up"), and 0 for a normal value.
-- Use it to print readable labels instead of bare NULLs.
SELECT
    CASE WHEN GROUPING(region) = 1 THEN 'All regions' ELSE region END AS region,
    CASE WHEN GROUPING(month)  = 1 THEN 'All months'  ELSE month  END AS month,
    SUM(amount) AS total
FROM sales
GROUP BY ROLLUP (region, month)
ORDER BY GROUPING(region), region, GROUPING(month), month;

-- ORDER BY GROUP
...

Result — same 9 rows, now labelled:

region	month	total
North	Jan	100
North	Feb	200
North	Mar	300
North	All months	600
South	Jan	400
South	Feb	500
South	Mar	600
South	All months	1500
All regions	All months	2100

Your Turn: build a ROLLUP report

Fill in the blanks to produce a per-region total plus a grand-total row. The expected result is in the comments so you can check yourself.

🎯 Your Turn: region subtotals + grand total

Replace the ___ blanks with ROLLUP and GROUPING.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — fill in the three blanks, then press "Try it Yourself"
-- Goal: a report with a total per region AND a grand total row.

SELECT region, SUM(amount) AS total
FROM sales
GROUP BY ___ (region)        -- 👉 keyword that adds the grand-total level
ORDER BY ___ (region),       -- 👉 puts the subtotal rows in order...
         region;             -- 👉 ...then region (NULL grand total sorts last)

-- Hint: replace the first ___ with ROLLUP, the second with GROUPING.
-- ✅ Expected resu
...

3. Pick Your Totals: GROUPING SETS & CUBE

ROLLUP always rolls up left to right. Sometimes you want totals that aren't a strict hierarchy — a region summary and a month summary side by side, with no detail rows. GROUPING SETS lets you list precisely the combinations you want; the empty set () is the grand total.

CUBE goes the other way and gives you every combination at once — detail, per-region, per-month, and grand total — which is ideal when a dashboard slices the same numbers in any direction.

GROUPING SETS — choose exactly

Region summary + month summary + grand total, no detail.

Try it Yourself »

SQL

-- GROUPING SETS lets you pick EXACTLY which totals you want —
-- no detail rows, just a region summary AND a month summary
-- AND a grand total, all in one pass over the table.
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY GROUPING SETS ((region), (month), ())
ORDER BY GROUPING(region), region, GROUPING(month), month;

-- () is the empty grouping set — that's the grand total.
-- ROLLUP(region, month) is just shorthand for the grouping sets
-- (region, month), (region) and ().

Result — 6 rows — two summaries plus the grand total:

region	month	total
North	(NULL)	600
South	(NULL)	1500
(NULL)	Jan	500
(NULL)	Feb	700
(NULL)	Mar	900
(NULL)	(NULL)	2100

CUBE on the same two columns adds the detail rows back on top of those summaries — every cross-section of the data in one result:

CUBE — every combination

Detail + region subtotals + month subtotals + grand total.

Try it Yourself »

SQL

-- CUBE gives EVERY combination: detail rows, a subtotal per
-- region, a subtotal per month, AND the grand total. Use it when
-- a dashboard needs to slice the same numbers any direction.
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY CUBE (region, month)
ORDER BY GROUPING(region), region, GROUPING(month), month;

-- CUBE(a, b) = grouping sets (a,b), (a), (b), ().
-- It can explode in size: n columns produce 2^n grouping sets,
-- so reach for ROLLUP unless you truly need every c
...

Result — 13 rows — detail, both subtotal axes, and the grand total:

region	month	total
North	Jan	100
North	Feb	200
North	Mar	300
North	(NULL)	600
South	Jan	400
South	Feb	500
South	Mar	600
South	(NULL)	1500
(NULL)	Jan	500
(NULL)	Feb	700
(NULL)	Mar	900
(NULL)	(NULL)	2100

(That's 12 of the 13 rows; the table above lists every grouping. CUBE of n columns produces 2^n grouping sets, so it grows fast — prefer ROLLUP unless you genuinely need every axis.)

4. Percentiles & Median (PERCENTILE_CONT / DISC)

An average is a single number that outliers can drag around; percentiles describe the whole spread. The median (the 50th percentile) is the middle value — half the data sits below it. In SQL these are ordered-set functions: you write PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount), where WITHIN GROUP names the column to rank.

PERCENTILE_CONT ("continuous") interpolates between rows, so it can return a value that isn't in the data — the true statistical median. PERCENTILE_DISC ("discrete") returns an actual value from a row. With our six amounts the continuous median is 350 (halfway between 300 and 400) while the discrete median is 300.

Mean vs. median vs. p90

PERCENTILE_CONT and PERCENTILE_DISC with WITHIN GROUP.

Try it Yourself »

SQL

-- Averages hide outliers; PERCENTILES describe the whole spread.
-- WITHIN GROUP (ORDER BY ...) tells the function which column to
-- rank. PERCENTILE_CONT interpolates between rows (a "true"
-- median); PERCENTILE_DISC returns an actual value from the data.
SELECT
    ROUND(AVG(amount), 2)                                            AS mean,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount)             AS median_cont,
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount)             AS 
...

Result — 1 row:

mean	median_cont	median_disc	p90
350	350	300	550

5. Buckets & Quartiles with NTILE

NTILE(n) is a window function that sorts your rows and slices them into n equally-sized buckets, numbered 1..n. NTILE(4) gives quartiles, NTILE(10) deciles, NTILE(100) percentile bands — perfect for "which tier does this customer fall in?" segmentation.

Uneven counts: when rows don't divide evenly, NTILE makes the earlier buckets one row larger. 7 rows into NTILE(2) gives buckets of 4 and 3, not 3 and 4.

NTILE(2) — split into halves

Bucket the six amounts into a lower and upper half.

Try it Yourself »

SQL

-- NTILE(n) splits ordered rows into n equal-sized buckets.
-- NTILE(4) = quartiles, NTILE(10) = deciles, NTILE(100) = percentiles.
SELECT region, month, amount,
       NTILE(2) OVER (ORDER BY amount) AS half      -- 1 = lower half, 2 = upper half
FROM sales
ORDER BY amount;

-- 6 rows sorted by amount split 3 + 3: the smallest three amounts
-- (100,200,300) land in bucket 1, the largest three (400,500,600)
-- in bucket 2. NTILE handles uneven counts by making the early
-- buckets one row bigger
...

Result — 6 rows — smallest three in bucket 1, largest three in bucket 2:

region	month	amount	half
North	Jan	100	1
North	Feb	200	1
North	Mar	300	1
South	Jan	400	2
South	Feb	500	2
South	Mar	600	2

Your Turn: buckets & median

Two blanks: split the amounts into 3 buckets and show the overall median on every row. The expected result is in the comments.

🎯 Your Turn: NTILE + median

Replace the ___ blanks with NTILE and WITHIN.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself"
-- Goal: tag each sale as 'Low', 'Mid' or 'High' by splitting the
-- amounts into 3 equal buckets, and also show the median amount.

SELECT region, month, amount,
       ___(3) OVER (ORDER BY amount) AS bucket,           -- 👉 the bucketing window function
       PERCENTILE_CONT(0.5) ___ GROUP (ORDER BY amount)   -- 👉 the clause percentiles need
           OVER () AS overall_median
FROM sales
ORDER BY amount;

-- Hint: first
...

6. Running & Cumulative Totals

A running total accumulates a value from the first row down to the current one — month-to-date revenue, a year-to-date count, a balance. You get it from a window SUM with an ORDER BY and an explicit frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means "everything so far, including this row".

Running total of monthly revenue

Accumulate a clean monthly series with a window SUM.

Try it Yourself »

SQL

-- RUNNING / CUMULATIVE totals: add up everything from the start of
-- the window down to the current row. Build a clean monthly series
-- first (a subquery), then accumulate it with a window SUM.
SELECT month, revenue,
    SUM(revenue) OVER (ORDER BY month
                       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS running_total
FROM (
    SELECT month, SUM(amount) AS revenue
    FROM sales
    GROUP BY month
) monthly
ORDER BY month;

-- Monthly revenue is Jan 500, Feb 
...

Result — 3 rows — the running total ends at the grand total:

month	revenue	running_total
Jan	500	500
Feb	700	1200
Mar	900	2100

7. Period-over-Period Growth (LAG)

To compare a period with the one before it, LAG(value, 1) reaches back one row in the ordered window. Subtract to get the change; divide by the previous value for a growth percentage. The first row has nothing before it, so LAG returns NULL there — and NULLIF(prev, 0) keeps your percentage from dividing by zero.

Want year-over-year instead of month-over-month? Use LAG(revenue, 12) to reach back twelve rows. LEAD is the mirror image — it looks forward to the next period.

Month-over-month change & %

LAG to compare each month with the previous one.

Try it Yourself »

SQL

-- PERIOD-OVER-PERIOD: compare each month with the one before it.
-- LAG(x, 1) reaches back one row; the first row has no previous
-- value, so it returns NULL (here we'd show '-').
SELECT month, revenue,
    LAG(revenue, 1) OVER (ORDER BY month)                AS prev_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month)      AS change,
    ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
          / NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100, 1)
        AS pct_change
FR
...

Result — 3 rows — Jan has no previous month:

month	revenue	prev_month	change	pct_change
Jan	500	(NULL)	(NULL)	(NULL)
Feb	700	500	200	40
Mar	900	700	200	28.6

8. Cohort & Funnel Queries (the big picture)

Cohort and funnel analyses sound advanced, but they're built entirely from the tools you just learned — grouped aggregates, CASE WHEN tagging, window functions, and NULLIF for safe division. The query below shows the shape of each (against imagined event tables) so you can recognise the pattern when you meet it.

Funnel — one CASE per stage, summed, then the ratio between consecutive stages tells you where users drop off.
Cohort — group customers by their first month, then count how many are still active N months later; divide by the cohort's starting size (FIRST_VALUE) for a retention %.

Funnel & cohort patterns

The recognisable shape — built from this lesson's tools.

Try it Yourself »

SQL

-- COHORT & FUNNEL queries reuse everything above — they are just
-- grouped aggregates with safe division. Two patterns to recognise:

-- 1) FUNNEL: one CASE per stage, summed, then ratios between stages.
--    (Imagine an 'events' table; here is the SHAPE of the query.)
SELECT
    COUNT(*)                                          AS sessions,
    SUM(CASE WHEN stage >= 1 THEN 1 ELSE 0 END)       AS viewed,
    SUM(CASE WHEN stage >= 2 THEN 1 ELSE 0 END)       AS added_to_cart,
    SUM(CASE WHE
...

Common Errors (and the fix)

Misreading NULL subtotal rows: a NULL in a ROLLUP/CUBE result marks a subtotal, not missing data. Don't filter them with WHERE region IS NOT NULL — that deletes your totals. Label them with GROUPING(region) = 1 instead.
Mixing detail and aggregate columns: "column 'month' must appear in the GROUP BY clause or be used in an aggregate" — every non-aggregated column you SELECT must be in the grouping. With grouping sets, only the columns being grouped on that row have a real value.
Percentile syntax: PERCENTILE_CONT(0.5) alone errors — it needs WITHIN GROUP (ORDER BY col). Writing PERCENTILE_CONT(amount) (passing the column as the argument) is the classic mistake; the argument is the fraction 0..1, the column goes in WITHIN GROUP.
Running total without a frame: SUM(x) OVER (ORDER BY month) uses the default RANGE frame, which lumps tied rows together. For a precise row-by-row accumulation, spell out ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Divide-by-zero in growth %: a missing or zero previous value throws division by zero. Wrap the denominator in NULLIF(prev, 0) so the result is NULL rather than an error.

📘 Quick Reference

Syntax	Purpose
GROUP BY ROLLUP (a, b)	Detail + subtotals per a + grand total
GROUP BY CUBE (a, b)	Every combination (2^n grouping sets)
GROUP BY GROUPING SETS ((a),(b),())	Exactly the totals you list
GROUPING(col)	1 if this row is a subtotal for col, else 0
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY x)	Interpolated median / percentile
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY x)	Median as an actual data value
NTILE(4) OVER (ORDER BY x)	Split rows into 4 equal buckets (quartiles)
SUM(x) OVER (ORDER BY t ROWS UNBOUNDED PRECEDING)	Running / cumulative total
LAG(x, 1) OVER (ORDER BY t)	Value from the previous period (12 = a year)
NULLIF(prev, 0)	Guard a denominator against divide-by-zero

Frequently Asked Questions

Q: When should I use ROLLUP vs. GROUPING SETS vs. CUBE?

Use ROLLUP for a natural hierarchy (region > month > grand total). Use GROUPING SETS when you want a specific list of summaries that isn't a strict hierarchy. Use CUBE only when you truly need every cross-section — it can produce a lot of rows.

Q: Why is my median a value that isn't in the table?

That's PERCENTILE_CONT doing its job — it interpolates between the two middle rows. If you need a real row value, use PERCENTILE_DISC instead.

Q: What's the difference between NTILE and a percentile?

NTILE(n) labels each row with which of n equal-sized buckets it falls in. PERCENTILE_CONT returns the boundary value at a given fraction. Buckets vs. cut-points — related, but answering different questions.

Q: My ROLLUP totals vanished after I added a WHERE clause — why?

You probably filtered on region IS NOT NULL, which removes the subtotal/grand-total rows (their grouped columns are NULL by design). Use GROUPING(region) = 1 to identify totals, and keep filters on the raw data columns only.

Mini-Challenge: Region Share Dashboard

Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a Postgres playground to confirm.

🎯 Mini-Challenge

Per-region total + grand total + each region's % of sales.

Try it Yourself »

SQL

-- 🎯 MINI-CHALLENGE — a mini sales dashboard
-- Using ONLY what this lesson covered, write ONE query that returns
-- a per-region report with a grand-total row AND, for each region,
-- its share of total sales as a percentage.
--   1. GROUP BY ROLLUP (region)
--   2. SUM(amount) AS total
--   3. ROUND(SUM(amount) * 100.0 / SUM(SUM(amount)) OVER (), 1) AS pct_of_total
--   4. Label the rolled-up row with GROUPING(region) (see section 2)
--   5. ORDER BY GROUPING(region), region
--
-- ✅ Expected 
...

🎉 Lesson Complete

✅ ROLLUP, CUBE and GROUPING SETS build multi-level reports with subtotals in one query
✅ GROUPING() tells a subtotal row apart from a missing value
✅ PERCENTILE_CONT/DISC … WITHIN GROUP give you median and percentiles
✅ NTILE(n) buckets rows into quartiles, deciles, or any tier
✅ Window SUM makes running totals; LAG makes period-over-period growth
✅ Cohort and funnel queries are just these tools combined with safe division
✅ Next: Query Profiling — make these analytical queries fast at scale

Analytical SQL for Business Intelligence

What You'll Learn

Our Sample Table: sales

1. From GROUP BY to Multi-Level Reports (ROLLUP)

Plain GROUP BY (one level)

ROLLUP (region, month)

2. Reading Subtotal Rows with GROUPING()

GROUPING() for readable labels

Your Turn: build a ROLLUP report

🎯 Your Turn: region subtotals + grand total

3. Pick Your Totals: GROUPING SETS & CUBE

GROUPING SETS — choose exactly

CUBE — every combination

4. Percentiles & Median (PERCENTILE_CONT / DISC)

Mean vs. median vs. p90

5. Buckets & Quartiles with NTILE

NTILE(2) — split into halves

Your Turn: buckets & median

🎯 Your Turn: NTILE + median

6. Running & Cumulative Totals

Running total of monthly revenue

7. Period-over-Period Growth (LAG)

Month-over-month change & %

8. Cohort & Funnel Queries (the big picture)

Funnel & cohort patterns

Common Errors (and the fix)

📘 Quick Reference

Frequently Asked Questions

Mini-Challenge: Region Share Dashboard

🎯 Mini-Challenge

🎉 Lesson Complete

Cookie & Privacy Settings

Our Sample Table: `sales`