SQL • Advanced Track
JSON & Semi-Structured Data
By the end of this lesson you'll be able to store flexible, differently-shaped data inside a single SQL column, then reach into nested fields, filter rows by what's inside the JSON, modify documents in place, and index them so those queries stay fast — in both PostgreSQL and MySQL.
What You'll Learn
- ✓Store semi-structured data with PostgreSQL json vs jsonb
- ✓Extract nested fields with -> ->> #> and #>>
- ✓Filter rows by values inside the JSON, with casting
- ✓Modify documents using jsonb_set, ||, and containment @>
- ✓Speed up JSON queries with GIN indexes
- ✓Query JSON in MySQL with JSON_EXTRACT, ->>, and JSON_TABLE
Our Sample Table: events
Every query in this lesson runs against this tiny events table. Each row's payload is a jsonb document — and notice the rows are not all the same shape. That is exactly what JSON columns are for.
Result:
| id | payload (jsonb) |
|---|---|
| 1 | {"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]} |
| 2 | {"type":"signup","customer":{"id":17,"tier":"free"},"source":"google_ads"} |
| 3 | {"type":"purchase","customer":{"id":42,"tier":"gold"},"total":29.50,"items":["cable"]} |
1. Storing Semi-Structured Data
Sometimes your data doesn't fit a neat grid. An event log might record purchases (with items and a total), signups (with a traffic source), and errors (with a stack trace) — each a different shape. Rather than inventing dozens of mostly-empty columns, you can store the whole thing as a JSON document in one column.
JSON ("JavaScript Object Notation") is the familiar {"key": value} text format. SQL treats the text between the quotes as a value to validate and store; from then on you query into it with special operators.
🗄️ Real-world analogy
Normal columns are a filing cabinet with a labelled drawer for every field — rigid, but instantly findable. A jsonb column is a labelled box you can drop any shaped folder into. Flexible, but you pay a little to find things inside it (that's what indexes fix).
Create a JSON column
A jsonb payload that holds a different shape per row.
-- Store a whole JSON document in ONE column.
-- "Semi-structured" = each row can hold a differently-shaped object.
-- In PostgreSQL, prefer jsonb over json (binary, indexable, dedupes keys).
CREATE TABLE events (
id SERIAL PRIMARY KEY,
payload JSONB NOT NULL -- the flexible bit lives here
);
INSERT INTO events (payload) VALUES
('{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]}'),
('{"type":"signup","customer":{"id":17,"tier
...2. PostgreSQL json vs jsonb
PostgreSQL gives you two JSON types and the difference matters. json stores the document as raw text — it keeps your exact spacing and key order, but must be re-parsed on every read and cannot be indexed for searching. jsonb stores a binary, decomposed form: a touch slower to write, far faster to query, it strips duplicate keys, and it supports GIN indexing.
jsonb. The only reason to choose json is if you must hand the document back byte-for-byte identical (whitespace and key order preserved) — which is rare.json vs jsonb
Same data, two storage behaviours.
-- json : kept as raw TEXT. Preserves exact whitespace & key order.
-- Re-parsed on every read. Cannot be indexed with GIN.
-- jsonb : decomposed BINARY. Slightly slower to write, much faster
-- to query, removes duplicate keys, and supports indexing.
-- Same data, two columns, two behaviours:
CREATE TABLE demo (
raw JSON, -- '{ "a": 1, "a": 2 }' stays exactly as typed
fast JSONB -- '{"a":2}' (whitespace gone, last key wins)
);
-- Rule of thumb: reach for js
...3. Extracting Fields: ->, ->>, #>, #>>
These four operators are the heart of working with JSON, and the single most important thing to learn is which ones return JSON and which return text:
->gets a field and keeps it as JSON (so you can chain another->onto it).->>gets a field and returns it as plain text.#>follows a whole path like{customer,id}and keeps it as JSON.#>>follows a path and returns text.
A field key uses 'quotes'; an array element uses its number, so payload -> 'items' ->> 0 is "the first item, as text".
The four extraction operators
Reach into nested objects and arrays.
-- The four extraction operators (PostgreSQL):
-- -> get a field, KEEP it as json/jsonb
-- ->> get a field, return it as TEXT
-- #> follow a PATH, KEEP it as json/jsonb
-- #>> follow a PATH, return it as TEXT
SELECT
payload -> 'customer' AS customer_json, -- {"id":42,"tier":"gold"}
payload ->> 'type' AS type_text, -- purchase
payload -> 'customer' ->> 'tier' AS tier, -- gold
payload #>> '{customer,id}' AS cust_id,
...Result:
| customer_json | type_text | tier | cust_id | first_item |
|---|---|---|---|---|
| {"id":42,"tier":"gold"} | purchase | gold | 42 | laptop |
Your Turn: extract a top-level field
Fill in the blanks to pull the top-level type field out of every event as text. The expected result is in the comments so you can check yourself.
🎯 Your Turn: get the type field
Replace the ___ blanks with the text operator and the key.
-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself".
-- Goal: pull the top-level "type" field out of every event AS plain text.
SELECT id, payload ___ '___' -- 👉 the TEXT operator, then the key name in quotes
FROM events;
-- ✅ Expected: 3 rows -> 1|purchase , 2|signup , 3|purchase
-- (Use ->> not -> : you want text, not a json string with quotes around it.)4. Filtering Rows by a JSON Value
You can use the same operators inside a WHERE clause to keep only the rows whose JSON matches. The catch: ->> always hands you text, so compare against a quoted string — or cast to a number/date when the type matters. Comparing '999.99' > '100' as text gives the wrong answer because text sorts character by character.
Filter by nested values
Text comparison vs casting to numeric.
-- Filtering rows by a value INSIDE the JSON.
-- Because ->> returns text, compare against a quoted string,
-- or cast both sides to the type you actually mean.
-- Every event whose nested customer tier is "gold":
SELECT id, payload ->> 'type' AS type
FROM events
WHERE payload -> 'customer' ->> 'tier' = 'gold'; -- rows 1 and 3
-- Numeric comparison needs a cast (text '999.99' < '30' otherwise!):
SELECT id
FROM events
WHERE (payload ->> 'total')::numeric > 100; -- row 1 onlyResult — tier = gold:
| id | type |
|---|---|
| 1 | purchase |
| 3 | purchase |
Your Turn: filter by a nested value
One concept this time — keep only the events placed by a gold-tier customer. The tier lives at customer.tier.
🎯 Your Turn: gold-tier events
Return rows where customer.tier equals gold.
-- 🎯 YOUR TURN — fill in the blank, then press "Try it Yourself".
-- Goal: return the id of every event whose nested customer tier is "gold".
SELECT id
FROM events
WHERE payload -> 'customer' ___ 'tier' = '___'; -- 👉 text operator, then the value
-- ✅ Expected: 2 rows -> 1 and 3
-- (Use ->> so the left side is text and matches the quoted 'gold'.)5. Containment @> and Modifying with jsonb_set
Spelling out a path for every condition gets tedious. The containment operator @> asks "does the left document contain this smaller JSON?" — it matches nested keys and array elements in one shot, and (unlike a chain of ->>) it can use a GIN index.
To change a document, jsonb_set(target, path, new_value) returns a modified copy. The path is a text array ({customer,tier} means customer → tier), and new_value must itself be JSON — so a string needs its quotes: '"platinum"'. To bulk-merge fields use ||; to drop a key use the - operator.
jsonb_set will not create missing intermediate objects. Setting {a,b,c} fails unless a and a.b already exist — build the parent first, or merge with ||.Containment @>
Match a JSON subset without writing the path.
-- @> "containment": does the left document CONTAIN this subset?
-- It matches nested keys/values without you spelling out the path,
-- and — crucially — it can use a GIN index (see section 6).
-- Purchases made by a gold-tier customer:
SELECT id FROM events
WHERE payload @> '{"type":"purchase","customer":{"tier":"gold"}}';
-- rows 1 and 3
-- Other handy jsonb predicates:
-- payload ? 'source' -- does the key "source" exist?
-- payload @> '{"items":["mouse"]}' -- a
...Result — purchases by a gold customer:
| id |
|---|
| 1 |
| 3 |
Modify with jsonb_set, || and -
Set, merge, and remove fields in place.
-- jsonb_set(target, path, new_value) returns a MODIFIED copy.
-- The path is a text[] array: '{a,b}' means a -> b.
-- new_value must itself be jsonb — note the quotes around a string.
-- Add / overwrite a top-level field:
UPDATE events
SET payload = jsonb_set(payload, '{status}', '"shipped"')
WHERE id = 1;
-- payload now also has "status":"shipped"
-- Update a NESTED field (customer.tier):
UPDATE events
SET payload = jsonb_set(payload, '{customer,tier}', '"platinum"')
WHERE id = 1;
-- Quick
...6. Indexing jsonb with GIN
Without an index, every @> or ? query reads the whole table. A GIN index (Generalized INverted index) catalogues the keys and values inside your documents so those lookups become fast.
A plain USING gin (payload) index powers the containment and key-existence operators. If you only ever use @>, the smaller jsonb_path_ops variant is faster. And when you constantly filter one extracted field by equality, a B-tree index on (payload ->> 'type') beats both. Use EXPLAIN ANALYZE to confirm the planner actually picks your index.
GIN and expression indexes
Make JSON queries fast.
-- jsonb with NO index = full table scan on every @> / ? query.
-- A GIN ("Generalized INverted") index fixes that.
-- 1) GIN over the whole column — powers @>, ?, ?|, ?&
CREATE INDEX idx_events_payload
ON events USING gin (payload);
-- This query can now use the index instead of scanning:
SELECT id FROM events
WHERE payload @> '{"customer":{"tier":"gold"}}';
-- 2) Smaller / faster index if you ONLY use @>:
CREATE INDEX idx_events_payload_path
ON events USING gin (payload jsonb_path_ops);
--
...7. JSON in MySQL
MySQL has a single JSON type that behaves much like PostgreSQL's jsonb — binary and validated. The operators differ: you extract with JSON_EXTRACT(doc, '$.path') or its shorthand -> (both return JSON), and ->> is the text version (shorthand for JSON_UNQUOTE(JSON_EXTRACT(...))). Paths use the $.key / $.arr[0] syntax.
The standout feature is JSON_TABLE, which explodes a JSON array into ordinary rows you can join and aggregate — turning a nested array into a relational result set.
MySQL JSON_EXTRACT, ->> and JSON_TABLE
The same ideas in MySQL syntax.
-- MySQL has its own JSON type (binary, validated, like jsonb).
-- Same ideas, different spelling:
-- Extract with JSON_EXTRACT or the -> shorthand (returns JSON):
SELECT JSON_EXTRACT(payload, '$.type') AS type_json, -- "purchase"
payload -> '$.customer.tier' AS tier_json, -- "gold" (quoted)
payload ->> '$.customer.tier' AS tier_text -- gold (unquoted)
FROM events;
-- ->> is shorthand for JSON_UNQUOTE(JSON_EXTRACT(...)) — your text operator.
-- Filter by a
...8. JSON vs Normalised Columns
JSON columns are powerful, but they are not a replacement for good schema design. If you filter, sort, join, or group by a value all the time — or it needs a constraint or a real date/money type — promote it to a proper column. Keep JSON for the genuinely variable, rarely-queried parts. The best real-world tables are hybrid: hot fields as columns, the long tail as jsonb.
Hybrid schema design
When to use JSON vs a normal column.
-- JSON is flexible, but flexibility is not free.
-- Pull values OUT into real columns when they earn it.
-- ✅ Good fit for JSON / jsonb:
-- • shape varies row to row (event payloads, webhook bodies)
-- • third-party API responses you store verbatim
-- • sparse / rarely-queried "extra" attributes
-- ❌ Promote to a normal column when:
-- • you filter, sort, join, or GROUP BY the value often
-- • it needs a FK, UNIQUE, NOT NULL or CHECK constraint
-- • the type matters (dates, money)
...Common Errors (and the fix)
- Comparison gives weird results — you used
->(returns JSON) where you meant->>(returns text).payload -> 'tier'is the JSON string"gold"with quotes;payload ->> 'tier'is the bare textgoldyou can compare against'gold'. - "operator does not exist: text > integer" —
->>hands you text. Cast before doing maths or ranges:(payload ->> 'total')::numeric > 100. - Numbers sort wrong — comparing extracted values as text means
'999.99' < '30'(character order). Always cast numeric JSON fields with::numeric/::int. - Query is slow on a big table — no GIN index. Add
CREATE INDEX ... USING gin (payload);and re-check withEXPLAIN ANALYZE. - Reaching for JSON for everything — if you filter or join a field constantly, it belongs in a real column. Over-using JSON loses constraints, types, and index efficiency.
jsonb_set"cannot set path" — the parent object is missing.jsonb_setwon't create intermediate keys; create the parent first or merge with||.
📘 Quick Reference
| Syntax | Purpose |
|---|---|
| payload -> 'k' | Get field k, return JSON (chainable) |
| payload ->> 'k' | Get field k, return TEXT |
| payload #> '{a,b}' | Follow a path, return JSON |
| payload #>> '{a,b}' | Follow a path, return TEXT |
| payload -> 'arr' ->> 0 | Array element by index, as text |
| payload @> '{...}' | Contains this JSON subset (GIN-able) |
| payload ? 'k' | Does key k exist? |
| jsonb_set(p,'{a,b}',v) | Return p with a.b set to v |
| payload || '{...}'::jsonb | Merge / concatenate documents |
| payload - 'k' | Remove key k |
| USING gin (payload) | Index for @>, ?, ?|, ?& |
| JSON_EXTRACT(p,'$.k') | MySQL: extract (or p ->> '$.k') |
Frequently Asked Questions
Q: When should I use -> versus ->>?
Use ->> whenever you want a usable value — to display it, compare it, or cast it. Use -> only to dig deeper, because it keeps the result as JSON so you can chain another arrow onto it.
Q: Why is my number comparison wrong?
Extracted values come out as text, and text sorts character by character (so '9' looks bigger than '30'). Cast first: (payload ->> 'total')::numeric.
Q: json or jsonb?
jsonb almost always. It's indexable and far faster to query. Only pick json if you must return the document with its original whitespace and key order intact.
Q: Should I just store everything as JSON?
No. Anything you filter, join, or sort on a lot — or that needs a constraint or a real type — should be a normal column. Use JSON for the variable, rarely-queried extras and you get the best of both worlds.
Mini-Challenge: High-Value Purchases
Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a playground to confirm.
🎯 Mini-Challenge
cust_id + total for purchases over 100, with casting.
-- 🎯 MINI-CHALLENGE
-- Using ONLY what this lesson covered (->, ->>, @>, casting):
-- 1. Return the customer id (as a number) and total for every PURCHASE
-- 2. Keep only purchases where total > 100
-- 3. Name the columns cust_id and total
--
-- Hints: nested id is at customer.id ; "purchase" is in the "type" field;
-- ->> gives text, so cast the number columns with ::int / ::numeric.
--
-- ✅ Expected: 1 row -> cust_id = 42 , total = 999.99
-- your query here🎉 Lesson Complete
- ✅
jsonbstores differently-shaped data in one column — prefer it overjson - ✅
->/#>return JSON;->>/#>>return text (cast for numbers) - ✅
@>matches a JSON subset and can use a GIN index - ✅
jsonb_set,||, and-modify documents in place - ✅ MySQL does the same with
JSON_EXTRACT,->>, andJSON_TABLE - ✅ Next: advanced data types — arrays, HSTORE, and geography
Sign up for free to track which lessons you've completed and get learning reminders.