SQL • Advanced Track

JSON & Semi-Structured Data

By the end of this lesson you'll be able to store flexible, differently-shaped data inside a single SQL column, then reach into nested fields, filter rows by what's inside the JSON, modify documents in place, and index them so those queries stay fast — in both PostgreSQL and MySQL.

What You'll Learn

✓Store semi-structured data with PostgreSQL json vs jsonb
✓Extract nested fields with -> ->> #> and #>>
✓Filter rows by values inside the JSON, with casting
✓Modify documents using jsonb_set, ||, and containment @>
✓Speed up JSON queries with GIN indexes
✓Query JSON in MySQL with JSON_EXTRACT, ->>, and JSON_TABLE

Before you start: you should be comfortable with SELECT, WHERE filtering, and basic data types. This is an advanced topic that builds on all of them. The in-browser editor lets you write and edit SQL; to run it, copy your query into a free playground like db-fiddle.com (pick PostgreSQL 15) or onecompiler.com. Every example below shows its expected result so you can check yourself.

Our Sample Table: `events`

Every query in this lesson runs against this tiny events table. Each row's payload is a jsonb document — and notice the rows are not all the same shape. That is exactly what JSON columns are for.

Result:

id	payload (jsonb)
1	{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]}
2	{"type":"signup","customer":{"id":17,"tier":"free"},"source":"google_ads"}
3	{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":29.50,"items":["cable"]}

1. Storing Semi-Structured Data

Sometimes your data doesn't fit a neat grid. An event log might record purchases (with items and a total), signups (with a traffic source), and errors (with a stack trace) — each a different shape. Rather than inventing dozens of mostly-empty columns, you can store the whole thing as a JSON document in one column.

JSON ("JavaScript Object Notation") is the familiar {"key": value} text format. SQL treats the text between the quotes as a value to validate and store; from then on you query into it with special operators.

🗄️ Real-world analogy

Normal columns are a filing cabinet with a labelled drawer for every field — rigid, but instantly findable. A jsonb column is a labelled box you can drop any shaped folder into. Flexible, but you pay a little to find things inside it (that's what indexes fix).

Create a JSON column

A jsonb payload that holds a different shape per row.

Try it Yourself »

SQL

-- Store a whole JSON document in ONE column.
-- "Semi-structured" = each row can hold a differently-shaped object.
-- In PostgreSQL, prefer jsonb over json (binary, indexable, dedupes keys).

CREATE TABLE events (
    id      SERIAL PRIMARY KEY,
    payload JSONB NOT NULL          -- the flexible bit lives here
);

INSERT INTO events (payload) VALUES
('{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]}'),
('{"type":"signup","customer":{"id":17,"tier
...

2. PostgreSQL `json` vs `jsonb`

PostgreSQL gives you two JSON types and the difference matters. json stores the document as raw text — it keeps your exact spacing and key order, but must be re-parsed on every read and cannot be indexed for searching. jsonb stores a binary, decomposed form: a touch slower to write, far faster to query, it strips duplicate keys, and it supports GIN indexing.

Pro tip: default to jsonb. The only reason to choose json is if you must hand the document back byte-for-byte identical (whitespace and key order preserved) — which is rare.

json vs jsonb

Same data, two storage behaviours.

Try it Yourself »

SQL

-- json  : kept as raw TEXT. Preserves exact whitespace & key order.
--         Re-parsed on every read. Cannot be indexed with GIN.
-- jsonb : decomposed BINARY. Slightly slower to write, much faster
--         to query, removes duplicate keys, and supports indexing.

-- Same data, two columns, two behaviours:
CREATE TABLE demo (
    raw  JSON,     -- '{ "a": 1,  "a": 2 }' stays exactly as typed
    fast JSONB     -- '{"a":2}'  (whitespace gone, last key wins)
);

-- Rule of thumb: reach for js
...

3. Extracting Fields: `->`, `->>`, `#>`, `#>>`

These four operators are the heart of working with JSON, and the single most important thing to learn is which ones return JSON and which return text:

-> gets a field and keeps it as JSON (so you can chain another -> onto it).
->> gets a field and returns it as plain text.
#> follows a whole path like {customer,id} and keeps it as JSON.
#>> follows a path and returns text.

A field key uses 'quotes'; an array element uses its number, so payload -> 'items' ->> 0 is "the first item, as text".

The four extraction operators

Reach into nested objects and arrays.

Try it Yourself »

SQL

-- The four extraction operators (PostgreSQL):
--   ->   get a field, KEEP it as json/jsonb
--   ->>  get a field, return it as TEXT
--   #>   follow a PATH, KEEP it as json/jsonb
--   #>>  follow a PATH, return it as TEXT

SELECT
    payload -> 'customer'            AS customer_json,   -- {"id":42,"tier":"gold"}
    payload ->> 'type'               AS type_text,       -- purchase
    payload -> 'customer' ->> 'tier' AS tier,            -- gold
    payload #>> '{customer,id}'      AS cust_id,   
...

Result:

customer_json	type_text	tier	cust_id	first_item
{"id":42,"tier":"gold"}	purchase	gold	42	laptop

Your Turn: extract a top-level field

Fill in the blanks to pull the top-level type field out of every event as text. The expected result is in the comments so you can check yourself.

🎯 Your Turn: get the type field

Replace the ___ blanks with the text operator and the key.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself".
-- Goal: pull the top-level "type" field out of every event AS plain text.

SELECT id, payload ___ '___'   -- 👉 the TEXT operator, then the key name in quotes
FROM events;

-- ✅ Expected: 3 rows -> 1|purchase , 2|signup , 3|purchase
--    (Use ->> not -> : you want text, not a json string with quotes around it.)

4. Filtering Rows by a JSON Value

You can use the same operators inside a WHERE clause to keep only the rows whose JSON matches. The catch: ->> always hands you text, so compare against a quoted string — or cast to a number/date when the type matters. Comparing '999.99' > '100' as text gives the wrong answer because text sorts character by character.

Filter by nested values

Text comparison vs casting to numeric.

Try it Yourself »

SQL

-- Filtering rows by a value INSIDE the JSON.
-- Because ->> returns text, compare against a quoted string,
-- or cast both sides to the type you actually mean.

-- Every event whose nested customer tier is "gold":
SELECT id, payload ->> 'type' AS type
FROM events
WHERE payload -> 'customer' ->> 'tier' = 'gold';   -- rows 1 and 3

-- Numeric comparison needs a cast (text '999.99' < '30' otherwise!):
SELECT id
FROM events
WHERE (payload ->> 'total')::numeric > 100;        -- row 1 only

Result — tier = gold:

id	type
1	purchase
3	purchase

Your Turn: filter by a nested value

One concept this time — keep only the events placed by a gold-tier customer. The tier lives at customer.tier.

🎯 Your Turn: gold-tier events

Return rows where customer.tier equals gold.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — fill in the blank, then press "Try it Yourself".
-- Goal: return the id of every event whose nested customer tier is "gold".

SELECT id
FROM events
WHERE payload -> 'customer' ___ 'tier' = '___';  -- 👉 text operator, then the value

-- ✅ Expected: 2 rows -> 1 and 3
--    (Use ->> so the left side is text and matches the quoted 'gold'.)

5. Containment `@>` and Modifying with `jsonb_set`

Spelling out a path for every condition gets tedious. The containment operator @> asks "does the left document contain this smaller JSON?" — it matches nested keys and array elements in one shot, and (unlike a chain of ->>) it can use a GIN index.

To change a document, jsonb_set(target, path, new_value) returns a modified copy. The path is a text array ({customer,tier} means customer → tier), and new_value must itself be JSON — so a string needs its quotes: '"platinum"'. To bulk-merge fields use ||; to drop a key use the - operator.

Pro tip: jsonb_set will not create missing intermediate objects. Setting {a,b,c} fails unless a and a.b already exist — build the parent first, or merge with ||.

Containment @>

Match a JSON subset without writing the path.

Try it Yourself »

SQL

-- @>  "containment": does the left document CONTAIN this subset?
-- It matches nested keys/values without you spelling out the path,
-- and — crucially — it can use a GIN index (see section 6).

-- Purchases made by a gold-tier customer:
SELECT id FROM events
WHERE payload @> '{"type":"purchase","customer":{"tier":"gold"}}';
-- rows 1 and 3

-- Other handy jsonb predicates:
--   payload ? 'source'                    -- does the key "source" exist?
--   payload @> '{"items":["mouse"]}'      -- a
...

Result — purchases by a gold customer:

id
1
3

Modify with jsonb_set, || and -

Set, merge, and remove fields in place.

Try it Yourself »

SQL

-- jsonb_set(target, path, new_value) returns a MODIFIED copy.
-- The path is a text[] array: '{a,b}' means a -> b.
-- new_value must itself be jsonb — note the quotes around a string.

-- Add / overwrite a top-level field:
UPDATE events
SET payload = jsonb_set(payload, '{status}', '"shipped"')
WHERE id = 1;
-- payload now also has  "status":"shipped"

-- Update a NESTED field (customer.tier):
UPDATE events
SET payload = jsonb_set(payload, '{customer,tier}', '"platinum"')
WHERE id = 1;

-- Quick
...

6. Indexing `jsonb` with GIN

Without an index, every @> or ? query reads the whole table. A GIN index (Generalized INverted index) catalogues the keys and values inside your documents so those lookups become fast.

A plain USING gin (payload) index powers the containment and key-existence operators. If you only ever use @>, the smaller jsonb_path_ops variant is faster. And when you constantly filter one extracted field by equality, a B-tree index on (payload ->> 'type') beats both. Use EXPLAIN ANALYZE to confirm the planner actually picks your index.

GIN and expression indexes

Make JSON queries fast.

Try it Yourself »

SQL

-- jsonb with NO index = full table scan on every @> / ? query.
-- A GIN ("Generalized INverted") index fixes that.

-- 1) GIN over the whole column — powers @>, ?, ?|, ?&
CREATE INDEX idx_events_payload
ON events USING gin (payload);

-- This query can now use the index instead of scanning:
SELECT id FROM events
WHERE payload @> '{"customer":{"tier":"gold"}}';

-- 2) Smaller / faster index if you ONLY use @>:
CREATE INDEX idx_events_payload_path
ON events USING gin (payload jsonb_path_ops);

--
...

7. JSON in MySQL

MySQL has a single JSON type that behaves much like PostgreSQL's jsonb — binary and validated. The operators differ: you extract with JSON_EXTRACT(doc, '$.path') or its shorthand -> (both return JSON), and ->> is the text version (shorthand for JSON_UNQUOTE(JSON_EXTRACT(...))). Paths use the $.key / $.arr[0] syntax.

The standout feature is JSON_TABLE, which explodes a JSON array into ordinary rows you can join and aggregate — turning a nested array into a relational result set.

MySQL JSON_EXTRACT, ->> and JSON_TABLE

The same ideas in MySQL syntax.

Try it Yourself »

SQL

-- MySQL has its own JSON type (binary, validated, like jsonb).
-- Same ideas, different spelling:

-- Extract with JSON_EXTRACT or the -> shorthand (returns JSON):
SELECT JSON_EXTRACT(payload, '$.type')   AS type_json,   -- "purchase"
       payload -> '$.customer.tier'      AS tier_json,   -- "gold"  (quoted)
       payload ->> '$.customer.tier'     AS tier_text    -- gold    (unquoted)
FROM events;

-- ->>  is shorthand for JSON_UNQUOTE(JSON_EXTRACT(...)) — your text operator.

-- Filter by a
...

8. JSON vs Normalised Columns

JSON columns are powerful, but they are not a replacement for good schema design. If you filter, sort, join, or group by a value all the time — or it needs a constraint or a real date/money type — promote it to a proper column. Keep JSON for the genuinely variable, rarely-queried parts. The best real-world tables are hybrid: hot fields as columns, the long tail as jsonb.

Hybrid schema design

When to use JSON vs a normal column.

Try it Yourself »

SQL

-- JSON is flexible, but flexibility is not free.
-- Pull values OUT into real columns when they earn it.

-- ✅ Good fit for JSON / jsonb:
--   • shape varies row to row (event payloads, webhook bodies)
--   • third-party API responses you store verbatim
--   • sparse / rarely-queried "extra" attributes

-- ❌ Promote to a normal column when:
--   • you filter, sort, join, or GROUP BY the value often
--   • it needs a FK, UNIQUE, NOT NULL or CHECK constraint
--   • the type matters (dates, money)
...

Common Errors (and the fix)

Comparison gives weird results — you used -> (returns JSON) where you meant ->> (returns text). payload -> 'tier' is the JSON string "gold" with quotes; payload ->> 'tier' is the bare text gold you can compare against 'gold'.
"operator does not exist: text > integer" — ->> hands you text. Cast before doing maths or ranges: (payload ->> 'total')::numeric > 100.
Numbers sort wrong — comparing extracted values as text means '999.99' < '30' (character order). Always cast numeric JSON fields with ::numeric / ::int.
Query is slow on a big table — no GIN index. Add CREATE INDEX ... USING gin (payload); and re-check with EXPLAIN ANALYZE.
Reaching for JSON for everything — if you filter or join a field constantly, it belongs in a real column. Over-using JSON loses constraints, types, and index efficiency.
jsonb_set "cannot set path" — the parent object is missing. jsonb_set won't create intermediate keys; create the parent first or merge with ||.

📘 Quick Reference

Syntax	Purpose
payload -> 'k'	Get field k, return JSON (chainable)
payload ->> 'k'	Get field k, return TEXT
payload #> '{a,b}'	Follow a path, return JSON
payload #>> '{a,b}'	Follow a path, return TEXT
payload -> 'arr' ->> 0	Array element by index, as text
payload @> '{...}'	Contains this JSON subset (GIN-able)
payload ? 'k'	Does key k exist?
jsonb_set(p,'{a,b}',v)	Return p with a.b set to v
payload \|\| '{...}'::jsonb	Merge / concatenate documents
payload - 'k'	Remove key k
USING gin (payload)	Index for @>, ?, ?\|, ?&
JSON_EXTRACT(p,'$.k')	MySQL: extract (or p ->> '$.k')

Frequently Asked Questions

Q: When should I use -> versus ->>?

Use ->> whenever you want a usable value — to display it, compare it, or cast it. Use -> only to dig deeper, because it keeps the result as JSON so you can chain another arrow onto it.

Q: Why is my number comparison wrong?

Extracted values come out as text, and text sorts character by character (so '9' looks bigger than '30'). Cast first: (payload ->> 'total')::numeric.

Q: json or jsonb?

jsonb almost always. It's indexable and far faster to query. Only pick json if you must return the document with its original whitespace and key order intact.

Q: Should I just store everything as JSON?

No. Anything you filter, join, or sort on a lot — or that needs a constraint or a real type — should be a normal column. Use JSON for the variable, rarely-queried extras and you get the best of both worlds.

Mini-Challenge: High-Value Purchases

Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a playground to confirm.

🎯 Mini-Challenge

cust_id + total for purchases over 100, with casting.

Try it Yourself »

SQL

-- 🎯 MINI-CHALLENGE
-- Using ONLY what this lesson covered (->, ->>, @>, casting):
--   1. Return the customer id (as a number) and total for every PURCHASE
--   2. Keep only purchases where total > 100
--   3. Name the columns cust_id and total
--
-- Hints: nested id is at customer.id ; "purchase" is in the "type" field;
--        ->> gives text, so cast the number columns with ::int / ::numeric.
--
-- ✅ Expected: 1 row -> cust_id = 42 , total = 999.99

-- your query here

🎉 Lesson Complete

✅ jsonb stores differently-shaped data in one column — prefer it over json
✅ ->/#> return JSON; ->>/#>> return text (cast for numbers)
✅ @> matches a JSON subset and can use a GIN index
✅ jsonb_set, ||, and - modify documents in place
✅ MySQL does the same with JSON_EXTRACT, ->>, and JSON_TABLE
✅ Next: advanced data types — arrays, HSTORE, and geography

JSON & Semi-Structured Data

What You'll Learn

Our Sample Table: events

1. Storing Semi-Structured Data

Create a JSON column

2. PostgreSQL json vs jsonb

json vs jsonb

3. Extracting Fields: ->, ->>, #>, #>>

The four extraction operators

Your Turn: extract a top-level field

🎯 Your Turn: get the type field

4. Filtering Rows by a JSON Value

Filter by nested values

Your Turn: filter by a nested value

🎯 Your Turn: gold-tier events

5. Containment @> and Modifying with jsonb_set

Containment @>

Modify with jsonb_set, || and -

6. Indexing jsonb with GIN

GIN and expression indexes

7. JSON in MySQL

MySQL JSON_EXTRACT, ->> and JSON_TABLE

8. JSON vs Normalised Columns

Hybrid schema design

Common Errors (and the fix)

📘 Quick Reference

Frequently Asked Questions

Mini-Challenge: High-Value Purchases

🎯 Mini-Challenge

🎉 Lesson Complete

Cookie & Privacy Settings

Our Sample Table: `events`

2. PostgreSQL `json` vs `jsonb`

3. Extracting Fields: `->`, `->>`, `#>`, `#>>`

5. Containment `@>` and Modifying with `jsonb_set`

6. Indexing `jsonb` with GIN