Advanced Track

Full-Text Search & Fuzzy Matching

By the end of this lesson you'll be able to build real, scalable search: index your text, find documents by meaning (not just exact characters), rank the hits by relevance, and tolerate typos. You'll see exactly why LIKE '%word%' falls apart — and what professionals reach for instead.

What You'll Learn

✓Why LIKE '%...%' can't scale or rank results
✓PostgreSQL tsvector, tsquery, and the @@ match operator
✓Boolean & natural-language search (AND/OR/NOT, phrases)
✓Rank results by relevance with ts_rank
✓Make it fast with a GIN index and weighted vectors
✓MySQL MATCH() AGAINST() plus fuzzy matching for typos

Before you start: you should be comfortable with SELECT, WHERE, and indexes — this lesson builds directly on all three. The in-browser editor lets you write and edit SQL; to run the PostgreSQL examples, copy your query into a free playground such as db-fiddle.com (pick PostgreSQL) or dbfiddle.uk. Every example below shows its expected result so you can check yourself.

Our Sample Table: `articles`

Every query in this lesson searches this little articles table. The body column is shown shortened, but full-text search reads the whole thing.

Result:

id	title	body (excerpt)
1	Getting Started with Databases	A database stores rows in tables…
2	Indexing Strategies for Big Tables	An index makes lookups fast; database tuning…
3	Normalisation Explained	Split data to remove duplication…
4	When a B-tree Index Beats a Full Scan	Index performance and query speed…
5	A Gentle Intro to NoSQL	Document stores trade joins for speed…

1. Why Not Just `LIKE '%word%'`?

You already know how to filter text with LIKE. So why does every serious app use something else for search? Because a leading-wildcard LIKE '%word%' has three problems that get worse as your data grows.

📚 Real-world analogy

LIKE '%word%' is reading every book in the library cover to cover to find a word. Full-text search is the index at the back of each book that a librarian built ahead of time — ask for "database optimization" and she pulls the right books instantly, sorted by how relevant they are.

The LIKE approach (and why it hurts)

A full table scan that can't rank or stem.

Try it Yourself »

SQL

-- The "obvious" way to search text: LIKE with wildcards
SELECT id, title
FROM articles
WHERE body LIKE '%database%';

-- This works on tiny tables, but it has three deal-breakers:
--   1. It can't use a normal index, so it reads EVERY row
--      and scans EVERY character — a full table scan.
--   2. It can't rank results. A title hit and a hit buried
--      in paragraph 90 come back equally "matched".
--   3. It's literal: 'database' won't match 'databases',
--      'Database', or a search fo
...

The fix is to pre-process text into a searchable form once, index it, and then search the index. That's what full-text search does.

2. PostgreSQL: `tsvector`, `tsquery` & `@@`

PostgreSQL splits search into two data types. A tsvector is the document side: your text broken into normalised root words (called lexemes). A tsquery is the search side: the words you're looking for. The @@ operator returns true when the document matches the query.

Stemming & stop-words: to_tsvector('english', …) reduces every word to its root (running, ran, runs → run) and throws away ultra-common stop-words (the, is, a, of) that carry no search value. That's why a search for databases still finds the word database — both stem to databas.

to_tsvector — the document side

Turn text into stemmed, stop-word-free lexemes.

Try it Yourself »

SQL

-- to_tsvector turns text into a tsvector: a sorted list of
-- normalised "lexemes" (root words) with their positions.
SELECT to_tsvector('english',
    'PostgreSQL is a powerful open-source database system');

-- Result (one tsvector value):
--   'databas':6 'open-sourc':4 'postgresql':1 'power':3
--   'sourc':5 'system':7
-- Notice what happened:
--   • 'powerful' became 'power'      (stemming)
--   • 'database' became 'databas'    (stemming)
--   • 'is' and 'a' vanished          (stop words r
...

Result — one tsvector value:

to_tsvector
'databas':6 'open-sourc':4 'postgresql':1 'power':3 'sourc':5 'system':7

to_tsquery & the @@ operator

Build the search side and match it.

Try it Yourself »

SQL

-- to_tsquery builds the SEARCH side. It is also stemmed,
-- so your search word matches every form of the word.
SELECT to_tsquery('english', 'powerful & databases');
-- Result:  'power' & 'databas'

-- The @@ operator asks "does this document match this query?"
SELECT to_tsvector('english', 'PostgreSQL is a powerful database')
       @@ to_tsquery('english', 'powerful & databases');
-- Result:  true
--   't' even though the text says "database" (singular) and
--   the query said "databases" (pl
...

Result — the @@ match returns true:

?column?
t

Your Turn: complete the match

Fill in the blanks so the query finds every article whose title or body mentions index. Hint: use the friendly query builder for a single plain word. The expected result is in the comments so you can check yourself.

🎯 Your Turn: tsvector @@ tsquery

Complete the to_tsquery call to match documents.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself"
-- Goal: return every article whose title OR body mentions "index".

SELECT id, title
FROM articles
WHERE to_tsvector('english', title || ' ' || body)
      @@ ___('english', ___);   -- 👉 1) the query builder for a plain word
                                --    2) the search word, in single quotes

-- ✅ Expected result: rows 2 and 4
--    2 | Indexing Strategies for Big Tables
--    4 | When a B-tree Index Beats a Full Sca
...

3. Boolean & Phrase Search

Beyond single words, to_tsquery understands a small operator language so users can be precise: combine terms, exclude terms, or demand an exact phrase.

AND / OR / NOT / phrase

The & | ! and <-> operators.

Try it Yourself »

SQL

-- tsquery has a small operator language:
--   &  AND   (both terms must appear)
--   |  OR    (either term)
--   !  NOT   (exclude a term)
--   <-> FOLLOWED BY  (phrase: the words are adjacent)

-- Articles about databases but NOT about MySQL:
SELECT id, title FROM articles
WHERE to_tsvector('english', body)
      @@ to_tsquery('english', 'database & !mysql');

-- The exact phrase "full text search":
SELECT id, title FROM articles
WHERE to_tsvector('english', body)
      @@ to_tsquery('english'
...

4. Ranking Results with `ts_rank`

The @@ operator only answers yes/no — but real search shows the best result first. ts_rank(document, query) returns a relevance score (higher is better) based on how often the terms appear and where. Sort by it with ORDER BY rank DESC.

Listing the query once in the FROM clause (plainto_tsquery(…) AS query) lets you reuse it in both the SELECT and the WHERE without building it twice. Prefer ts_rank_cd when how close the words sit to each other matters.

ts_rank — relevance scoring

Score and sort matches like a search engine.

Try it Yourself »

SQL

-- @@ only answers yes/no. ts_rank gives a relevance SCORE
-- (higher = more relevant) so you can ORDER results like Google.
SELECT id, title,
       ts_rank(to_tsvector('english', title || ' ' || body), query) AS rank
FROM articles,
     plainto_tsquery('english', 'database performance') AS query   -- run query once
WHERE to_tsvector('english', title || ' ' || body) @@ query
ORDER BY rank DESC;

-- ts_rank scores on term frequency and position.
-- ts_rank_cd ("cover density") also rewards terms
...

Result — ordered by relevance, best first:

id	title	rank
2	Indexing Strategies for Big Tables	0.0607
4	When a B-tree Index Beats a Full Scan	0.0304
1	Getting Started with Databases	0.0152

Your Turn: add relevance ranking

The WHERE already finds the matches. Add the scoring function and the sort direction so the most relevant article comes back first.

🎯 Your Turn: rank & order

Compute the score and order best-first.

Try it Yourself »

SQL

-- 🎯 YOUR TURN — add relevance ranking to this search.
-- The WHERE already finds the matches; you must SCORE and SORT them.

SELECT id, title,
       ___(to_tsvector('english', title || ' ' || body), query) AS rank
FROM articles,
     plainto_tsquery('english', 'index performance') AS query
WHERE to_tsvector('english', title || ' ' || body) @@ query
ORDER BY rank ___;   -- 👉 1) the function that returns a relevance score
                     --    2) the direction so the BEST match is first


...

5. Make It Fast: GIN Index & Weights

Everything so far recomputes to_tsvector() for every row on every query — fine for demos, far too slow for millions of rows. Store the vector in a column and put a GIN index on it (Generalised INverted index — the same idea as a book's back-of-book index). With setweight() you also tag where each word came from so title hits outrank body hits.

Why a generated column: a GENERATED ALWAYS AS (…) STORED column keeps the tsvector perfectly in sync with your text automatically — no triggers to maintain, and the GIN index updates itself.

GIN index + weighted vector

Pre-store lexemes for instant, weighted search.

Try it Yourself »

SQL

-- Speed: by default Postgres recomputes to_tsvector() for every
-- row on every query. A GIN index pre-stores the lexemes so the
-- search jumps straight to matching rows — instant on millions.

-- Best practice (Postgres 12+): a STORED generated column,
-- so the tsvector is always in sync with the text automatically.
ALTER TABLE articles
    ADD COLUMN search tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector
...

6. MySQL: `MATCH() AGAINST()`

MySQL takes a different route. You add a FULLTEXT index over the text columns, then search with MATCH(columns) AGAINST('terms'). Natural-language mode (the default) ranks by relevance automatically; boolean mode unlocks +/-/"phrase" operators much like Postgres' tsquery.

MySQL MATCH … AGAINST

FULLTEXT index, natural-language & boolean mode.

Try it Yourself »

SQL

-- MySQL does full-text search with a FULLTEXT index and
-- the MATCH(...) AGAINST(...) pair instead of tsvector/tsquery.

-- 1) Add a FULLTEXT index over the columns you'll search:
CREATE FULLTEXT INDEX idx_articles_ft ON articles (title, body);

-- 2) Natural-language mode (the default): MySQL ranks by relevance.
--    Putting MATCH in the SELECT returns that relevance score.
SELECT id, title,
       MATCH(title, body) AGAINST('database performance') AS score
FROM articles
WHERE MATCH(title, b
...

7. Fuzzy Matching for Typos

Full-text search needs the right word (after stemming). But users misspell things — "databse", "Samsnug". Fuzzy matching measures how similar two strings are so a near-miss still matches. Two common tools, both high-level here: trigram similarity (pg_trgm) and Levenshtein distance (fuzzystrmatch).

Use both together. Full-text answers "what topic is this about?"; trigram/Levenshtein answer "the user typed it slightly wrong." A great search box tries the full-text match first, then falls back to a similarity() match when nothing comes back.

Trigram similarity & Levenshtein

Typo-tolerant matching at a high level.

Try it Yourself »

SQL

-- Full-text search needs the RIGHT word (after stemming).
-- For typos and misspellings you need FUZZY matching.

-- (a) Trigram similarity — pg_trgm. It compares the 3-letter
--     chunks two strings share and scores them from 0 to 1.
CREATE EXTENSION IF NOT EXISTS pg_trgm;

SELECT similarity('database', 'databse');   -- ~0.55  (typo still close)
SELECT title FROM articles
WHERE similarity(title, 'databse') > 0.3     -- catches the misspelling
ORDER BY similarity(title, 'databse') DESC;
-- In
...

Common Errors (and the fix)

No index, so search crawls: querying to_tsvector(body) @@ … with no GIN index (Postgres) or no FULLTEXT index (MySQL) silently does a full table scan. It "works" on 100 rows and times out on a million — add the index before you ship.
Using LIKE for real search: LIKE '%term%' can't use a normal index, can't stem, and can't rank. Reach for it only for tiny tables or exact substring checks, never for a search box.
Ignoring stop-words: searching for 'the' & 'of' returns nothing — those words were stripped from the vector. Search on the meaningful words, and remember a search for databases matches database thanks to stemming.
Forgetting to rank: filtering with @@ but no ts_rank + ORDER BY returns matches in arbitrary order. Always score and sort, or your "search" lists the worst hit first.
Wrong tsquery for user input: feeding a raw phrase to to_tsquery (to_tsquery('english', 'database performance')) errors on the space — use plainto_tsquery for plain user text.

📘 Quick Reference

Syntax	Purpose
to_tsvector('english', t)	Text → searchable lexemes (stemmed, stop-words removed)
to_tsquery('english', 'a & b')	Build a search with operators (& \| ! <->)
plainto_tsquery('english', t)	Build a search from plain user text (ANDs words)
doc @@ query	TRUE if the document matches the query
ts_rank(doc, query)	Relevance score — ORDER BY it, DESC
setweight(v, 'A')	Tag lexemes A…D so title outranks body
USING gin (search)	Index a tsvector column for fast search
MATCH(cols) AGAINST('t')	MySQL full-text search (needs a FULLTEXT index)
similarity(a, b)	pg_trgm trigram score 0–1 for typo tolerance

Frequently Asked Questions

Q: When is LIKE still fine?

For exact substring checks on small tables, or a prefix match like LIKE 'abc%' (no leading wildcard), which can use a normal index. For a real search box over lots of text, use full-text search.

Q: What's the difference between to_tsquery and plainto_tsquery?

to_tsquery expects operator syntax ('database & !mysql') and errors on a bare space. plainto_tsquery takes free text from a user and ANDs the words for you — safer for search-box input.

Q: Do I need a GIN index to use full-text search?

No — it works without one, but every query rescans the table. Once your data grows, a GIN index (Postgres) or FULLTEXT index (MySQL) is what makes search fast.

Q: Full-text search vs. fuzzy matching — which do I want?

Both. Full-text finds documents about a topic (with stemming and ranking); fuzzy matching (pg_trgm, Levenshtein) catches typos. Production search usually layers full-text first with a fuzzy fallback.

Mini-Challenge: A Ranked Search, End to End

Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a PostgreSQL playground to confirm.

🎯 Mini-Challenge

Search 'tuning' OR 'speed', rank with ts_rank, top 5.

Try it Yourself »

SQL

-- 🎯 MINI-CHALLENGE — a ranked search, end to end
-- Using ONLY what this lesson covered:
--   1. Search the articles table for the words "tuning" OR "speed"
--      (build the query with to_tsquery and the | operator).
--   2. Compute a relevance column called rank with ts_rank.
--   3. Return id, title, rank — most relevant FIRST — top 5 only.
--
-- ✅ Expected: at most 5 rows, ordered by rank descending,
--    with the best-matching article at the top.

-- your query here

🎉 Lesson Complete

✅ LIKE '%word%' can't scale, stem, or rank — full-text search can
✅ to_tsvector + to_tsquery matched with @@ (with stemming & stop-words)
✅ ts_rank scores relevance so you can ORDER BY rank DESC
✅ A GIN index on a weighted tsvector column makes it fast
✅ MySQL does the same with MATCH() AGAINST() and a FULLTEXT index
✅ Fuzzy tools (pg_trgm, Levenshtein) tolerate typos
✅ Next: storing and querying JSON & semi-structured data

Full-Text Search & Fuzzy Matching

What You'll Learn

Our Sample Table: articles

1. Why Not Just LIKE '%word%'?

The LIKE approach (and why it hurts)

2. PostgreSQL: tsvector, tsquery & @@

to_tsvector — the document side

to_tsquery & the @@ operator

Your Turn: complete the match

🎯 Your Turn: tsvector @@ tsquery

3. Boolean & Phrase Search

AND / OR / NOT / phrase

4. Ranking Results with ts_rank

ts_rank — relevance scoring

Your Turn: add relevance ranking

🎯 Your Turn: rank & order

5. Make It Fast: GIN Index & Weights

GIN index + weighted vector

6. MySQL: MATCH() AGAINST()

MySQL MATCH … AGAINST

7. Fuzzy Matching for Typos

Trigram similarity & Levenshtein

Common Errors (and the fix)

📘 Quick Reference

Frequently Asked Questions

Mini-Challenge: A Ranked Search, End to End

🎯 Mini-Challenge

🎉 Lesson Complete

Cookie & Privacy Settings

Our Sample Table: `articles`

1. Why Not Just `LIKE '%word%'`?

2. PostgreSQL: `tsvector`, `tsquery` & `@@`

4. Ranking Results with `ts_rank`

6. MySQL: `MATCH() AGAINST()`