Skip to main content
    Back

    SQL • Advanced Track

    JSON & Semi-Structured Data

    By the end of this lesson you'll be able to store flexible, differently-shaped data inside a single SQL column, then reach into nested fields, filter rows by what's inside the JSON, modify documents in place, and index them so those queries stay fast — in both PostgreSQL and MySQL.

    What You'll Learn

    • Store semi-structured data with PostgreSQL json vs jsonb
    • Extract nested fields with -> ->> #> and #>>
    • Filter rows by values inside the JSON, with casting
    • Modify documents using jsonb_set, ||, and containment @>
    • Speed up JSON queries with GIN indexes
    • Query JSON in MySQL with JSON_EXTRACT, ->>, and JSON_TABLE

    Our Sample Table: events

    Every query in this lesson runs against this tiny events table. Each row's payload is a jsonb document — and notice the rows are not all the same shape. That is exactly what JSON columns are for.

    Result:

    idpayload (jsonb)
    1{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]}
    2{"type":"signup","customer":{"id":17,"tier":"free"},"source":"google_ads"}
    3{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":29.50,"items":["cable"]}

    1. Storing Semi-Structured Data

    Sometimes your data doesn't fit a neat grid. An event log might record purchases (with items and a total), signups (with a traffic source), and errors (with a stack trace) — each a different shape. Rather than inventing dozens of mostly-empty columns, you can store the whole thing as a JSON document in one column.

    JSON ("JavaScript Object Notation") is the familiar {"key": value} text format. SQL treats the text between the quotes as a value to validate and store; from then on you query into it with special operators.

    🗄️ Real-world analogy

    Normal columns are a filing cabinet with a labelled drawer for every field — rigid, but instantly findable. A jsonb column is a labelled box you can drop any shaped folder into. Flexible, but you pay a little to find things inside it (that's what indexes fix).

    Create a JSON column

    A jsonb payload that holds a different shape per row.

    Try it Yourself »
    SQL
    -- Store a whole JSON document in ONE column.
    -- "Semi-structured" = each row can hold a differently-shaped object.
    -- In PostgreSQL, prefer jsonb over json (binary, indexable, dedupes keys).
    
    CREATE TABLE events (
        id      SERIAL PRIMARY KEY,
        payload JSONB NOT NULL          -- the flexible bit lives here
    );
    
    INSERT INTO events (payload) VALUES
    ('{"type":"purchase","customer":{"id":42,"tier":"gold"},"total":999.99,"items":["laptop","mouse"]}'),
    ('{"type":"signup","customer":{"id":17,"tier
    ...

    2. PostgreSQL json vs jsonb

    PostgreSQL gives you two JSON types and the difference matters. json stores the document as raw text — it keeps your exact spacing and key order, but must be re-parsed on every read and cannot be indexed for searching. jsonb stores a binary, decomposed form: a touch slower to write, far faster to query, it strips duplicate keys, and it supports GIN indexing.

    json vs jsonb

    Same data, two storage behaviours.

    Try it Yourself »
    SQL
    -- json  : kept as raw TEXT. Preserves exact whitespace & key order.
    --         Re-parsed on every read. Cannot be indexed with GIN.
    -- jsonb : decomposed BINARY. Slightly slower to write, much faster
    --         to query, removes duplicate keys, and supports indexing.
    
    -- Same data, two columns, two behaviours:
    CREATE TABLE demo (
        raw  JSON,     -- '{ "a": 1,  "a": 2 }' stays exactly as typed
        fast JSONB     -- '{"a":2}'  (whitespace gone, last key wins)
    );
    
    -- Rule of thumb: reach for js
    ...

    3. Extracting Fields: ->, ->>, #>, #>>

    These four operators are the heart of working with JSON, and the single most important thing to learn is which ones return JSON and which return text:

    • -> gets a field and keeps it as JSON (so you can chain another -> onto it).
    • ->> gets a field and returns it as plain text.
    • #> follows a whole path like {customer,id} and keeps it as JSON.
    • #>> follows a path and returns text.

    A field key uses 'quotes'; an array element uses its number, so payload -> 'items' ->> 0 is "the first item, as text".

    The four extraction operators

    Reach into nested objects and arrays.

    Try it Yourself »
    SQL
    -- The four extraction operators (PostgreSQL):
    --   ->   get a field, KEEP it as json/jsonb
    --   ->>  get a field, return it as TEXT
    --   #>   follow a PATH, KEEP it as json/jsonb
    --   #>>  follow a PATH, return it as TEXT
    
    SELECT
        payload -> 'customer'            AS customer_json,   -- {"id":42,"tier":"gold"}
        payload ->> 'type'               AS type_text,       -- purchase
        payload -> 'customer' ->> 'tier' AS tier,            -- gold
        payload #>> '{customer,id}'      AS cust_id,   
    ...

    Result:

    customer_jsontype_texttiercust_idfirst_item
    {"id":42,"tier":"gold"}purchasegold42laptop

    Your Turn: extract a top-level field

    Fill in the blanks to pull the top-level type field out of every event as text. The expected result is in the comments so you can check yourself.

    🎯 Your Turn: get the type field

    Replace the ___ blanks with the text operator and the key.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself".
    -- Goal: pull the top-level "type" field out of every event AS plain text.
    
    SELECT id, payload ___ '___'   -- 👉 the TEXT operator, then the key name in quotes
    FROM events;
    
    -- ✅ Expected: 3 rows -> 1|purchase , 2|signup , 3|purchase
    --    (Use ->> not -> : you want text, not a json string with quotes around it.)

    4. Filtering Rows by a JSON Value

    You can use the same operators inside a WHERE clause to keep only the rows whose JSON matches. The catch: ->> always hands you text, so compare against a quoted string — or cast to a number/date when the type matters. Comparing '999.99' > '100' as text gives the wrong answer because text sorts character by character.

    Filter by nested values

    Text comparison vs casting to numeric.

    Try it Yourself »
    SQL
    -- Filtering rows by a value INSIDE the JSON.
    -- Because ->> returns text, compare against a quoted string,
    -- or cast both sides to the type you actually mean.
    
    -- Every event whose nested customer tier is "gold":
    SELECT id, payload ->> 'type' AS type
    FROM events
    WHERE payload -> 'customer' ->> 'tier' = 'gold';   -- rows 1 and 3
    
    -- Numeric comparison needs a cast (text '999.99' < '30' otherwise!):
    SELECT id
    FROM events
    WHERE (payload ->> 'total')::numeric > 100;        -- row 1 only

    Result — tier = gold:

    idtype
    1purchase
    3purchase

    Your Turn: filter by a nested value

    One concept this time — keep only the events placed by a gold-tier customer. The tier lives at customer.tier.

    🎯 Your Turn: gold-tier events

    Return rows where customer.tier equals gold.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the blank, then press "Try it Yourself".
    -- Goal: return the id of every event whose nested customer tier is "gold".
    
    SELECT id
    FROM events
    WHERE payload -> 'customer' ___ 'tier' = '___';  -- 👉 text operator, then the value
    
    -- ✅ Expected: 2 rows -> 1 and 3
    --    (Use ->> so the left side is text and matches the quoted 'gold'.)

    5. Containment @> and Modifying with jsonb_set

    Spelling out a path for every condition gets tedious. The containment operator @> asks "does the left document contain this smaller JSON?" — it matches nested keys and array elements in one shot, and (unlike a chain of ->>) it can use a GIN index.

    To change a document, jsonb_set(target, path, new_value) returns a modified copy. The path is a text array ({customer,tier} means customer → tier), and new_value must itself be JSON — so a string needs its quotes: '"platinum"'. To bulk-merge fields use ||; to drop a key use the - operator.

    Containment @>

    Match a JSON subset without writing the path.

    Try it Yourself »
    SQL
    -- @>  "containment": does the left document CONTAIN this subset?
    -- It matches nested keys/values without you spelling out the path,
    -- and — crucially — it can use a GIN index (see section 6).
    
    -- Purchases made by a gold-tier customer:
    SELECT id FROM events
    WHERE payload @> '{"type":"purchase","customer":{"tier":"gold"}}';
    -- rows 1 and 3
    
    -- Other handy jsonb predicates:
    --   payload ? 'source'                    -- does the key "source" exist?
    --   payload @> '{"items":["mouse"]}'      -- a
    ...

    Result — purchases by a gold customer:

    id
    1
    3

    Modify with jsonb_set, || and -

    Set, merge, and remove fields in place.

    Try it Yourself »
    SQL
    -- jsonb_set(target, path, new_value) returns a MODIFIED copy.
    -- The path is a text[] array: '{a,b}' means a -> b.
    -- new_value must itself be jsonb — note the quotes around a string.
    
    -- Add / overwrite a top-level field:
    UPDATE events
    SET payload = jsonb_set(payload, '{status}', '"shipped"')
    WHERE id = 1;
    -- payload now also has  "status":"shipped"
    
    -- Update a NESTED field (customer.tier):
    UPDATE events
    SET payload = jsonb_set(payload, '{customer,tier}', '"platinum"')
    WHERE id = 1;
    
    -- Quick
    ...

    6. Indexing jsonb with GIN

    Without an index, every @> or ? query reads the whole table. A GIN index (Generalized INverted index) catalogues the keys and values inside your documents so those lookups become fast.

    A plain USING gin (payload) index powers the containment and key-existence operators. If you only ever use @>, the smaller jsonb_path_ops variant is faster. And when you constantly filter one extracted field by equality, a B-tree index on (payload ->> 'type') beats both. Use EXPLAIN ANALYZE to confirm the planner actually picks your index.

    GIN and expression indexes

    Make JSON queries fast.

    Try it Yourself »
    SQL
    -- jsonb with NO index = full table scan on every @> / ? query.
    -- A GIN ("Generalized INverted") index fixes that.
    
    -- 1) GIN over the whole column — powers @>, ?, ?|, ?&
    CREATE INDEX idx_events_payload
    ON events USING gin (payload);
    
    -- This query can now use the index instead of scanning:
    SELECT id FROM events
    WHERE payload @> '{"customer":{"tier":"gold"}}';
    
    -- 2) Smaller / faster index if you ONLY use @>:
    CREATE INDEX idx_events_payload_path
    ON events USING gin (payload jsonb_path_ops);
    
    --
    ...

    7. JSON in MySQL

    MySQL has a single JSON type that behaves much like PostgreSQL's jsonb — binary and validated. The operators differ: you extract with JSON_EXTRACT(doc, '$.path') or its shorthand -> (both return JSON), and ->> is the text version (shorthand for JSON_UNQUOTE(JSON_EXTRACT(...))). Paths use the $.key / $.arr[0] syntax.

    The standout feature is JSON_TABLE, which explodes a JSON array into ordinary rows you can join and aggregate — turning a nested array into a relational result set.

    MySQL JSON_EXTRACT, ->> and JSON_TABLE

    The same ideas in MySQL syntax.

    Try it Yourself »
    SQL
    -- MySQL has its own JSON type (binary, validated, like jsonb).
    -- Same ideas, different spelling:
    
    -- Extract with JSON_EXTRACT or the -> shorthand (returns JSON):
    SELECT JSON_EXTRACT(payload, '$.type')   AS type_json,   -- "purchase"
           payload -> '$.customer.tier'      AS tier_json,   -- "gold"  (quoted)
           payload ->> '$.customer.tier'     AS tier_text    -- gold    (unquoted)
    FROM events;
    
    -- ->>  is shorthand for JSON_UNQUOTE(JSON_EXTRACT(...)) — your text operator.
    
    -- Filter by a
    ...

    8. JSON vs Normalised Columns

    JSON columns are powerful, but they are not a replacement for good schema design. If you filter, sort, join, or group by a value all the time — or it needs a constraint or a real date/money type — promote it to a proper column. Keep JSON for the genuinely variable, rarely-queried parts. The best real-world tables are hybrid: hot fields as columns, the long tail as jsonb.

    Hybrid schema design

    When to use JSON vs a normal column.

    Try it Yourself »
    SQL
    -- JSON is flexible, but flexibility is not free.
    -- Pull values OUT into real columns when they earn it.
    
    -- ✅ Good fit for JSON / jsonb:
    --   • shape varies row to row (event payloads, webhook bodies)
    --   • third-party API responses you store verbatim
    --   • sparse / rarely-queried "extra" attributes
    
    -- ❌ Promote to a normal column when:
    --   • you filter, sort, join, or GROUP BY the value often
    --   • it needs a FK, UNIQUE, NOT NULL or CHECK constraint
    --   • the type matters (dates, money)
    ...

    Common Errors (and the fix)

    • Comparison gives weird results — you used -> (returns JSON) where you meant ->> (returns text). payload -> 'tier' is the JSON string "gold" with quotes; payload ->> 'tier' is the bare text gold you can compare against 'gold'.
    • "operator does not exist: text > integer"->> hands you text. Cast before doing maths or ranges: (payload ->> 'total')::numeric > 100.
    • Numbers sort wrong — comparing extracted values as text means '999.99' < '30' (character order). Always cast numeric JSON fields with ::numeric / ::int.
    • Query is slow on a big table — no GIN index. Add CREATE INDEX ... USING gin (payload); and re-check with EXPLAIN ANALYZE.
    • Reaching for JSON for everything — if you filter or join a field constantly, it belongs in a real column. Over-using JSON loses constraints, types, and index efficiency.
    • jsonb_set "cannot set path" — the parent object is missing. jsonb_set won't create intermediate keys; create the parent first or merge with ||.

    📘 Quick Reference

    SyntaxPurpose
    payload -> 'k'Get field k, return JSON (chainable)
    payload ->> 'k'Get field k, return TEXT
    payload #> '{a,b}'Follow a path, return JSON
    payload #>> '{a,b}'Follow a path, return TEXT
    payload -> 'arr' ->> 0Array element by index, as text
    payload @> '{...}'Contains this JSON subset (GIN-able)
    payload ? 'k'Does key k exist?
    jsonb_set(p,'{a,b}',v)Return p with a.b set to v
    payload || '{...}'::jsonbMerge / concatenate documents
    payload - 'k'Remove key k
    USING gin (payload)Index for @>, ?, ?|, ?&
    JSON_EXTRACT(p,'$.k')MySQL: extract (or p ->> '$.k')

    Frequently Asked Questions

    Q: When should I use -> versus ->>?

    Use ->> whenever you want a usable value — to display it, compare it, or cast it. Use -> only to dig deeper, because it keeps the result as JSON so you can chain another arrow onto it.

    Q: Why is my number comparison wrong?

    Extracted values come out as text, and text sorts character by character (so '9' looks bigger than '30'). Cast first: (payload ->> 'total')::numeric.

    Q: json or jsonb?

    jsonb almost always. It's indexable and far faster to query. Only pick json if you must return the document with its original whitespace and key order intact.

    Q: Should I just store everything as JSON?

    No. Anything you filter, join, or sort on a lot — or that needs a constraint or a real type — should be a normal column. Use JSON for the variable, rarely-queried extras and you get the best of both worlds.

    Mini-Challenge: High-Value Purchases

    Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a playground to confirm.

    🎯 Mini-Challenge

    cust_id + total for purchases over 100, with casting.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE
    -- Using ONLY what this lesson covered (->, ->>, @>, casting):
    --   1. Return the customer id (as a number) and total for every PURCHASE
    --   2. Keep only purchases where total > 100
    --   3. Name the columns cust_id and total
    --
    -- Hints: nested id is at customer.id ; "purchase" is in the "type" field;
    --        ->> gives text, so cast the number columns with ::int / ::numeric.
    --
    -- ✅ Expected: 1 row -> cust_id = 42 , total = 999.99
    
    -- your query here

    🎉 Lesson Complete

    • jsonb stores differently-shaped data in one column — prefer it over json
    • ->/#> return JSON; ->>/#>> return text (cast for numbers)
    • @> matches a JSON subset and can use a GIN index
    • jsonb_set, ||, and - modify documents in place
    • ✅ MySQL does the same with JSON_EXTRACT, ->>, and JSON_TABLE
    • Next: advanced data types — arrays, HSTORE, and geography

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service