Advanced Track • Analytical & BI SQL
Analytical SQL for Business Intelligence
By the end of this lesson you'll be able to turn a flat table into a real BI report: multi-level totals with subtotals and a grand total, percentiles and median, quartile buckets, running totals, and month-over-month growth — the exact toolkit behind every dashboard.
What You'll Learn
- ✓Build multi-level reports with ROLLUP, CUBE and GROUPING SETS
- ✓Label subtotal & grand-total rows cleanly with GROUPING()
- ✓Compute median and percentiles with PERCENTILE_CONT / DISC
- ✓Bucket rows into quartiles and deciles with NTILE
- ✓Add running / cumulative totals with window functions
- ✓Measure period-over-period growth with LAG, and read cohort/funnel queries
GROUPING SETS, CUBE and PERCENTILE_CONT). Every example below shows the expected result so you can check yourself.GROUPING SETS, ROLLUP, CUBE and PERCENTILE_CONT … WITHIN GROUP are standard SQL and work in PostgreSQL, SQL Server, Oracle and Snowflake. SQLite and older MySQL don't support all of them, so use a Postgres playground for this lesson. Our Sample Table: sales
Every query in this lesson runs against this tiny sales table. It's deliberately small so you can add the numbers in your head and confirm each report by hand — that's the fastest way to trust what a subtotal or percentile is doing.
Result:
| region | month | amount |
|---|---|---|
| North | Jan | 100 |
| North | Feb | 200 |
| North | Mar | 300 |
| South | Jan | 400 |
| South | Feb | 500 |
| South | Mar | 600 |
Handy totals to memorise: North = 600, South = 1500, Grand total = 2100. By month: Jan 500, Feb 700, Mar 900.
1. From GROUP BY to Multi-Level Reports (ROLLUP)
A plain GROUP BY gives you exactly one level of summary — one row per region, say. A report, though, almost always wants more: a subtotal for each region and a grand total at the bottom. ROLLUP produces that whole hierarchy in a single query.
📊 Real-world analogy
ROLLUP is the Subtotal button in a spreadsheet. You group by region, and it inserts a subtotal line after each region plus one grand-total line at the very end — without you copying formulas around. GROUP BY ROLLUP (region, month) reads left to right: month rolls up into region, region rolls up into the grand total.
Plain GROUP BY (one level)
One total per region — the starting point.
-- A plain GROUP BY gives you ONE level: a total per region.
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
ORDER BY region;
-- That's fine, but a real report also wants a GRAND TOTAL row,
-- and often per-month totals too. Doing that by hand means
-- running three queries and gluing them together. There's a
-- better way: GROUPING SETS, ROLLUP and CUBE (next sections).Result — 2 rows:
| region | total |
|---|---|
| North | 600 |
| South | 1500 |
Now add ROLLUP. You get the detail rows, a subtotal per region (where month is NULL), and one grand-total row (where both are NULL):
ROLLUP (region, month)
Detail rows + per-region subtotals + grand total.
-- ROLLUP builds a HIERARCHY of subtotals from left to right:
-- (region, month) -> every detail row
-- (region) -> a subtotal per region (month is NULL)
-- () -> one grand total (both are NULL)
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY ROLLUP (region, month)
ORDER BY region, month;
-- The NULLs in the region/month columns are not missing data —
-- they MARK the subtotal and grand-total rows. We'll label them
-- properly with GROUPING
...Result — 9 rows — NULLs mark the subtotal and grand-total rows:
| region | month | total |
|---|---|---|
| North | Jan | 100 |
| North | Feb | 200 |
| North | Mar | 300 |
| North | (NULL) | 600 |
| South | Jan | 400 |
| South | Feb | 500 |
| South | Mar | 600 |
| South | (NULL) | 1500 |
| (NULL) | (NULL) | 2100 |
2. Reading Subtotal Rows with GROUPING()
Those NULLs are the part beginners misread. A NULL in a ROLLUP result does not mean "missing data" — it means "this column was rolled up; this row is a subtotal". The GROUPING(col) function makes that explicit: it returns 1 for a rolled-up (subtotal) column and 0 for a normal value, so you can swap in a readable label.
GROUPING(region), GROUPING(month) in your ORDER BY so each subtotal sits directly beneath the rows it sums and the grand total lands last — otherwise the NULL rows scatter unpredictably.GROUPING() for readable labels
Turn NULL subtotal markers into 'All regions' / 'All months'.
-- GROUPING(col) returns 1 when this row is a subtotal for that
-- column (the value was "rolled up"), and 0 for a normal value.
-- Use it to print readable labels instead of bare NULLs.
SELECT
CASE WHEN GROUPING(region) = 1 THEN 'All regions' ELSE region END AS region,
CASE WHEN GROUPING(month) = 1 THEN 'All months' ELSE month END AS month,
SUM(amount) AS total
FROM sales
GROUP BY ROLLUP (region, month)
ORDER BY GROUPING(region), region, GROUPING(month), month;
-- ORDER BY GROUP
...Result — same 9 rows, now labelled:
| region | month | total |
|---|---|---|
| North | Jan | 100 |
| North | Feb | 200 |
| North | Mar | 300 |
| North | All months | 600 |
| South | Jan | 400 |
| South | Feb | 500 |
| South | Mar | 600 |
| South | All months | 1500 |
| All regions | All months | 2100 |
Your Turn: build a ROLLUP report
Fill in the blanks to produce a per-region total plus a grand-total row. The expected result is in the comments so you can check yourself.
🎯 Your Turn: region subtotals + grand total
Replace the ___ blanks with ROLLUP and GROUPING.
-- 🎯 YOUR TURN — fill in the three blanks, then press "Try it Yourself"
-- Goal: a report with a total per region AND a grand total row.
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY ___ (region) -- 👉 keyword that adds the grand-total level
ORDER BY ___ (region), -- 👉 puts the subtotal rows in order...
region; -- 👉 ...then region (NULL grand total sorts last)
-- Hint: replace the first ___ with ROLLUP, the second with GROUPING.
-- ✅ Expected resu
...3. Pick Your Totals: GROUPING SETS & CUBE
ROLLUP always rolls up left to right. Sometimes you want totals that aren't a strict hierarchy — a region summary and a month summary side by side, with no detail rows. GROUPING SETS lets you list precisely the combinations you want; the empty set () is the grand total.
CUBE goes the other way and gives you every combination at once — detail, per-region, per-month, and grand total — which is ideal when a dashboard slices the same numbers in any direction.
GROUPING SETS — choose exactly
Region summary + month summary + grand total, no detail.
-- GROUPING SETS lets you pick EXACTLY which totals you want —
-- no detail rows, just a region summary AND a month summary
-- AND a grand total, all in one pass over the table.
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY GROUPING SETS ((region), (month), ())
ORDER BY GROUPING(region), region, GROUPING(month), month;
-- () is the empty grouping set — that's the grand total.
-- ROLLUP(region, month) is just shorthand for the grouping sets
-- (region, month), (region) and ().Result — 6 rows — two summaries plus the grand total:
| region | month | total |
|---|---|---|
| North | (NULL) | 600 |
| South | (NULL) | 1500 |
| (NULL) | Jan | 500 |
| (NULL) | Feb | 700 |
| (NULL) | Mar | 900 |
| (NULL) | (NULL) | 2100 |
CUBE on the same two columns adds the detail rows back on top of those summaries — every cross-section of the data in one result:
CUBE — every combination
Detail + region subtotals + month subtotals + grand total.
-- CUBE gives EVERY combination: detail rows, a subtotal per
-- region, a subtotal per month, AND the grand total. Use it when
-- a dashboard needs to slice the same numbers any direction.
SELECT region, month, SUM(amount) AS total
FROM sales
GROUP BY CUBE (region, month)
ORDER BY GROUPING(region), region, GROUPING(month), month;
-- CUBE(a, b) = grouping sets (a,b), (a), (b), ().
-- It can explode in size: n columns produce 2^n grouping sets,
-- so reach for ROLLUP unless you truly need every c
...Result — 13 rows — detail, both subtotal axes, and the grand total:
| region | month | total |
|---|---|---|
| North | Jan | 100 |
| North | Feb | 200 |
| North | Mar | 300 |
| North | (NULL) | 600 |
| South | Jan | 400 |
| South | Feb | 500 |
| South | Mar | 600 |
| South | (NULL) | 1500 |
| (NULL) | Jan | 500 |
| (NULL) | Feb | 700 |
| (NULL) | Mar | 900 |
| (NULL) | (NULL) | 2100 |
(That's 12 of the 13 rows; the table above lists every grouping. CUBE of n columns produces 2^n grouping sets, so it grows fast — prefer ROLLUP unless you genuinely need every axis.)
4. Percentiles & Median (PERCENTILE_CONT / DISC)
An average is a single number that outliers can drag around; percentiles describe the whole spread. The median (the 50th percentile) is the middle value — half the data sits below it. In SQL these are ordered-set functions: you write PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount), where WITHIN GROUP names the column to rank.
PERCENTILE_CONT ("continuous") interpolates between rows, so it can return a value that isn't in the data — the true statistical median. PERCENTILE_DISC ("discrete") returns an actual value from a row. With our six amounts the continuous median is 350 (halfway between 300 and 400) while the discrete median is 300.
Mean vs. median vs. p90
PERCENTILE_CONT and PERCENTILE_DISC with WITHIN GROUP.
-- Averages hide outliers; PERCENTILES describe the whole spread.
-- WITHIN GROUP (ORDER BY ...) tells the function which column to
-- rank. PERCENTILE_CONT interpolates between rows (a "true"
-- median); PERCENTILE_DISC returns an actual value from the data.
SELECT
ROUND(AVG(amount), 2) AS mean,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_cont,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS
...Result — 1 row:
| mean | median_cont | median_disc | p90 |
|---|---|---|---|
| 350 | 350 | 300 | 550 |
5. Buckets & Quartiles with NTILE
NTILE(n) is a window function that sorts your rows and slices them into n equally-sized buckets, numbered 1..n. NTILE(4) gives quartiles, NTILE(10) deciles, NTILE(100) percentile bands — perfect for "which tier does this customer fall in?" segmentation.
NTILE makes the earlier buckets one row larger. 7 rows into NTILE(2) gives buckets of 4 and 3, not 3 and 4.NTILE(2) — split into halves
Bucket the six amounts into a lower and upper half.
-- NTILE(n) splits ordered rows into n equal-sized buckets.
-- NTILE(4) = quartiles, NTILE(10) = deciles, NTILE(100) = percentiles.
SELECT region, month, amount,
NTILE(2) OVER (ORDER BY amount) AS half -- 1 = lower half, 2 = upper half
FROM sales
ORDER BY amount;
-- 6 rows sorted by amount split 3 + 3: the smallest three amounts
-- (100,200,300) land in bucket 1, the largest three (400,500,600)
-- in bucket 2. NTILE handles uneven counts by making the early
-- buckets one row bigger
...Result — 6 rows — smallest three in bucket 1, largest three in bucket 2:
| region | month | amount | half |
|---|---|---|---|
| North | Jan | 100 | 1 |
| North | Feb | 200 | 1 |
| North | Mar | 300 | 1 |
| South | Jan | 400 | 2 |
| South | Feb | 500 | 2 |
| South | Mar | 600 | 2 |
Your Turn: buckets & median
Two blanks: split the amounts into 3 buckets and show the overall median on every row. The expected result is in the comments.
🎯 Your Turn: NTILE + median
Replace the ___ blanks with NTILE and WITHIN.
-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself"
-- Goal: tag each sale as 'Low', 'Mid' or 'High' by splitting the
-- amounts into 3 equal buckets, and also show the median amount.
SELECT region, month, amount,
___(3) OVER (ORDER BY amount) AS bucket, -- 👉 the bucketing window function
PERCENTILE_CONT(0.5) ___ GROUP (ORDER BY amount) -- 👉 the clause percentiles need
OVER () AS overall_median
FROM sales
ORDER BY amount;
-- Hint: first
...6. Running & Cumulative Totals
A running total accumulates a value from the first row down to the current one — month-to-date revenue, a year-to-date count, a balance. You get it from a window SUM with an ORDER BY and an explicit frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means "everything so far, including this row".
Running total of monthly revenue
Accumulate a clean monthly series with a window SUM.
-- RUNNING / CUMULATIVE totals: add up everything from the start of
-- the window down to the current row. Build a clean monthly series
-- first (a subquery), then accumulate it with a window SUM.
SELECT month, revenue,
SUM(revenue) OVER (ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_total
FROM (
SELECT month, SUM(amount) AS revenue
FROM sales
GROUP BY month
) monthly
ORDER BY month;
-- Monthly revenue is Jan 500, Feb
...Result — 3 rows — the running total ends at the grand total:
| month | revenue | running_total |
|---|---|---|
| Jan | 500 | 500 |
| Feb | 700 | 1200 |
| Mar | 900 | 2100 |
7. Period-over-Period Growth (LAG)
To compare a period with the one before it, LAG(value, 1) reaches back one row in the ordered window. Subtract to get the change; divide by the previous value for a growth percentage. The first row has nothing before it, so LAG returns NULL there — and NULLIF(prev, 0) keeps your percentage from dividing by zero.
Want year-over-year instead of month-over-month? Use LAG(revenue, 12) to reach back twelve rows. LEAD is the mirror image — it looks forward to the next period.
Month-over-month change & %
LAG to compare each month with the previous one.
-- PERIOD-OVER-PERIOD: compare each month with the one before it.
-- LAG(x, 1) reaches back one row; the first row has no previous
-- value, so it returns NULL (here we'd show '-').
SELECT month, revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month))
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0) * 100, 1)
AS pct_change
FR
...Result — 3 rows — Jan has no previous month:
| month | revenue | prev_month | change | pct_change |
|---|---|---|---|---|
| Jan | 500 | (NULL) | (NULL) | (NULL) |
| Feb | 700 | 500 | 200 | 40 |
| Mar | 900 | 700 | 200 | 28.6 |
8. Cohort & Funnel Queries (the big picture)
Cohort and funnel analyses sound advanced, but they're built entirely from the tools you just learned — grouped aggregates, CASE WHEN tagging, window functions, and NULLIF for safe division. The query below shows the shape of each (against imagined event tables) so you can recognise the pattern when you meet it.
- Funnel — one
CASEper stage, summed, then the ratio between consecutive stages tells you where users drop off. - Cohort — group customers by their first month, then count how many are still active N months later; divide by the cohort's starting size (
FIRST_VALUE) for a retention %.
Funnel & cohort patterns
The recognisable shape — built from this lesson's tools.
-- COHORT & FUNNEL queries reuse everything above — they are just
-- grouped aggregates with safe division. Two patterns to recognise:
-- 1) FUNNEL: one CASE per stage, summed, then ratios between stages.
-- (Imagine an 'events' table; here is the SHAPE of the query.)
SELECT
COUNT(*) AS sessions,
SUM(CASE WHEN stage >= 1 THEN 1 ELSE 0 END) AS viewed,
SUM(CASE WHEN stage >= 2 THEN 1 ELSE 0 END) AS added_to_cart,
SUM(CASE WHE
...Common Errors (and the fix)
- Misreading NULL subtotal rows: a
NULLin aROLLUP/CUBEresult marks a subtotal, not missing data. Don't filter them withWHERE region IS NOT NULL— that deletes your totals. Label them withGROUPING(region) = 1instead. - Mixing detail and aggregate columns:
"column 'month' must appear in the GROUP BY clause or be used in an aggregate"— every non-aggregated column youSELECTmust be in the grouping. With grouping sets, only the columns being grouped on that row have a real value. - Percentile syntax:
PERCENTILE_CONT(0.5)alone errors — it needsWITHIN GROUP (ORDER BY col). WritingPERCENTILE_CONT(amount)(passing the column as the argument) is the classic mistake; the argument is the fraction0..1, the column goes inWITHIN GROUP. - Running total without a frame:
SUM(x) OVER (ORDER BY month)uses the defaultRANGEframe, which lumps tied rows together. For a precise row-by-row accumulation, spell outROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. - Divide-by-zero in growth %: a missing or zero previous value throws
division by zero. Wrap the denominator inNULLIF(prev, 0)so the result isNULLrather than an error.
📘 Quick Reference
| Syntax | Purpose |
|---|---|
| GROUP BY ROLLUP (a, b) | Detail + subtotals per a + grand total |
| GROUP BY CUBE (a, b) | Every combination (2^n grouping sets) |
| GROUP BY GROUPING SETS ((a),(b),()) | Exactly the totals you list |
| GROUPING(col) | 1 if this row is a subtotal for col, else 0 |
| PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY x) | Interpolated median / percentile |
| PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY x) | Median as an actual data value |
| NTILE(4) OVER (ORDER BY x) | Split rows into 4 equal buckets (quartiles) |
| SUM(x) OVER (ORDER BY t ROWS UNBOUNDED PRECEDING) | Running / cumulative total |
| LAG(x, 1) OVER (ORDER BY t) | Value from the previous period (12 = a year) |
| NULLIF(prev, 0) | Guard a denominator against divide-by-zero |
Frequently Asked Questions
Q: When should I use ROLLUP vs. GROUPING SETS vs. CUBE?
Use ROLLUP for a natural hierarchy (region > month > grand total). Use GROUPING SETS when you want a specific list of summaries that isn't a strict hierarchy. Use CUBE only when you truly need every cross-section — it can produce a lot of rows.
Q: Why is my median a value that isn't in the table?
That's PERCENTILE_CONT doing its job — it interpolates between the two middle rows. If you need a real row value, use PERCENTILE_DISC instead.
Q: What's the difference between NTILE and a percentile?
NTILE(n) labels each row with which of n equal-sized buckets it falls in. PERCENTILE_CONT returns the boundary value at a given fraction. Buckets vs. cut-points — related, but answering different questions.
Q: My ROLLUP totals vanished after I added a WHERE clause — why?
You probably filtered on region IS NOT NULL, which removes the subtotal/grand-total rows (their grouped columns are NULL by design). Use GROUPING(region) = 1 to identify totals, and keep filters on the raw data columns only.
Mini-Challenge: Region Share Dashboard
Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a Postgres playground to confirm.
🎯 Mini-Challenge
Per-region total + grand total + each region's % of sales.
-- 🎯 MINI-CHALLENGE — a mini sales dashboard
-- Using ONLY what this lesson covered, write ONE query that returns
-- a per-region report with a grand-total row AND, for each region,
-- its share of total sales as a percentage.
-- 1. GROUP BY ROLLUP (region)
-- 2. SUM(amount) AS total
-- 3. ROUND(SUM(amount) * 100.0 / SUM(SUM(amount)) OVER (), 1) AS pct_of_total
-- 4. Label the rolled-up row with GROUPING(region) (see section 2)
-- 5. ORDER BY GROUPING(region), region
--
-- ✅ Expected
...🎉 Lesson Complete
- ✅
ROLLUP,CUBEandGROUPING SETSbuild multi-level reports with subtotals in one query - ✅
GROUPING()tells a subtotal row apart from a missing value - ✅
PERCENTILE_CONT/DISC … WITHIN GROUPgive you median and percentiles - ✅
NTILE(n)buckets rows into quartiles, deciles, or any tier - ✅ Window
SUMmakes running totals;LAGmakes period-over-period growth - ✅ Cohort and funnel queries are just these tools combined with safe division
- ✅ Next: Query Profiling — make these analytical queries fast at scale
Sign up for free to track which lessons you've completed and get learning reminders.