Skip to main content
    Back

    Advanced Track

    Full-Text Search & Fuzzy Matching

    By the end of this lesson you'll be able to build real, scalable search: index your text, find documents by meaning (not just exact characters), rank the hits by relevance, and tolerate typos. You'll see exactly why LIKE '%word%' falls apart — and what professionals reach for instead.

    What You'll Learn

    • Why LIKE '%...%' can't scale or rank results
    • PostgreSQL tsvector, tsquery, and the @@ match operator
    • Boolean & natural-language search (AND/OR/NOT, phrases)
    • Rank results by relevance with ts_rank
    • Make it fast with a GIN index and weighted vectors
    • MySQL MATCH() AGAINST() plus fuzzy matching for typos

    Our Sample Table: articles

    Every query in this lesson searches this little articles table. The body column is shown shortened, but full-text search reads the whole thing.

    Result:

    idtitlebody (excerpt)
    1Getting Started with DatabasesA database stores rows in tables…
    2Indexing Strategies for Big TablesAn index makes lookups fast; database tuning…
    3Normalisation ExplainedSplit data to remove duplication…
    4When a B-tree Index Beats a Full ScanIndex performance and query speed…
    5A Gentle Intro to NoSQLDocument stores trade joins for speed…

    1. Why Not Just LIKE '%word%'?

    You already know how to filter text with LIKE. So why does every serious app use something else for search? Because a leading-wildcard LIKE '%word%' has three problems that get worse as your data grows.

    📚 Real-world analogy

    LIKE '%word%' is reading every book in the library cover to cover to find a word. Full-text search is the index at the back of each book that a librarian built ahead of time — ask for "database optimization" and she pulls the right books instantly, sorted by how relevant they are.

    The LIKE approach (and why it hurts)

    A full table scan that can't rank or stem.

    Try it Yourself »
    SQL
    -- The "obvious" way to search text: LIKE with wildcards
    SELECT id, title
    FROM articles
    WHERE body LIKE '%database%';
    
    -- This works on tiny tables, but it has three deal-breakers:
    --   1. It can't use a normal index, so it reads EVERY row
    --      and scans EVERY character — a full table scan.
    --   2. It can't rank results. A title hit and a hit buried
    --      in paragraph 90 come back equally "matched".
    --   3. It's literal: 'database' won't match 'databases',
    --      'Database', or a search fo
    ...

    The fix is to pre-process text into a searchable form once, index it, and then search the index. That's what full-text search does.

    2. PostgreSQL: tsvector, tsquery & @@

    PostgreSQL splits search into two data types. A tsvector is the document side: your text broken into normalised root words (called lexemes). A tsquery is the search side: the words you're looking for. The @@ operator returns true when the document matches the query.

    to_tsvector — the document side

    Turn text into stemmed, stop-word-free lexemes.

    Try it Yourself »
    SQL
    -- to_tsvector turns text into a tsvector: a sorted list of
    -- normalised "lexemes" (root words) with their positions.
    SELECT to_tsvector('english',
        'PostgreSQL is a powerful open-source database system');
    
    -- Result (one tsvector value):
    --   'databas':6 'open-sourc':4 'postgresql':1 'power':3
    --   'sourc':5 'system':7
    -- Notice what happened:
    --   • 'powerful' became 'power'      (stemming)
    --   • 'database' became 'databas'    (stemming)
    --   • 'is' and 'a' vanished          (stop words r
    ...

    Result — one tsvector value:

    to_tsvector
    'databas':6 'open-sourc':4 'postgresql':1 'power':3 'sourc':5 'system':7

    to_tsquery & the @@ operator

    Build the search side and match it.

    Try it Yourself »
    SQL
    -- to_tsquery builds the SEARCH side. It is also stemmed,
    -- so your search word matches every form of the word.
    SELECT to_tsquery('english', 'powerful & databases');
    -- Result:  'power' & 'databas'
    
    -- The @@ operator asks "does this document match this query?"
    SELECT to_tsvector('english', 'PostgreSQL is a powerful database')
           @@ to_tsquery('english', 'powerful & databases');
    -- Result:  true
    --   't' even though the text says "database" (singular) and
    --   the query said "databases" (pl
    ...

    Result — the @@ match returns true:

    ?column?
    t

    Your Turn: complete the match

    Fill in the blanks so the query finds every article whose title or body mentions index. Hint: use the friendly query builder for a single plain word. The expected result is in the comments so you can check yourself.

    🎯 Your Turn: tsvector @@ tsquery

    Complete the to_tsquery call to match documents.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself"
    -- Goal: return every article whose title OR body mentions "index".
    
    SELECT id, title
    FROM articles
    WHERE to_tsvector('english', title || ' ' || body)
          @@ ___('english', ___);   -- 👉 1) the query builder for a plain word
                                    --    2) the search word, in single quotes
    
    -- ✅ Expected result: rows 2 and 4
    --    2 | Indexing Strategies for Big Tables
    --    4 | When a B-tree Index Beats a Full Sca
    ...

    3. Boolean & Phrase Search

    Beyond single words, to_tsquery understands a small operator language so users can be precise: combine terms, exclude terms, or demand an exact phrase.

    AND / OR / NOT / phrase

    The & | ! and <-> operators.

    Try it Yourself »
    SQL
    -- tsquery has a small operator language:
    --   &  AND   (both terms must appear)
    --   |  OR    (either term)
    --   !  NOT   (exclude a term)
    --   <-> FOLLOWED BY  (phrase: the words are adjacent)
    
    -- Articles about databases but NOT about MySQL:
    SELECT id, title FROM articles
    WHERE to_tsvector('english', body)
          @@ to_tsquery('english', 'database & !mysql');
    
    -- The exact phrase "full text search":
    SELECT id, title FROM articles
    WHERE to_tsvector('english', body)
          @@ to_tsquery('english'
    ...

    4. Ranking Results with ts_rank

    The @@ operator only answers yes/no — but real search shows the best result first. ts_rank(document, query) returns a relevance score (higher is better) based on how often the terms appear and where. Sort by it with ORDER BY rank DESC.

    Listing the query once in the FROM clause (plainto_tsquery(…) AS query) lets you reuse it in both the SELECT and the WHERE without building it twice. Prefer ts_rank_cd when how close the words sit to each other matters.

    ts_rank — relevance scoring

    Score and sort matches like a search engine.

    Try it Yourself »
    SQL
    -- @@ only answers yes/no. ts_rank gives a relevance SCORE
    -- (higher = more relevant) so you can ORDER results like Google.
    SELECT id, title,
           ts_rank(to_tsvector('english', title || ' ' || body), query) AS rank
    FROM articles,
         plainto_tsquery('english', 'database performance') AS query   -- run query once
    WHERE to_tsvector('english', title || ' ' || body) @@ query
    ORDER BY rank DESC;
    
    -- ts_rank scores on term frequency and position.
    -- ts_rank_cd ("cover density") also rewards terms
    ...

    Result — ordered by relevance, best first:

    idtitlerank
    2Indexing Strategies for Big Tables0.0607
    4When a B-tree Index Beats a Full Scan0.0304
    1Getting Started with Databases0.0152

    Your Turn: add relevance ranking

    The WHERE already finds the matches. Add the scoring function and the sort direction so the most relevant article comes back first.

    🎯 Your Turn: rank & order

    Compute the score and order best-first.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — add relevance ranking to this search.
    -- The WHERE already finds the matches; you must SCORE and SORT them.
    
    SELECT id, title,
           ___(to_tsvector('english', title || ' ' || body), query) AS rank
    FROM articles,
         plainto_tsquery('english', 'index performance') AS query
    WHERE to_tsvector('english', title || ' ' || body) @@ query
    ORDER BY rank ___;   -- 👉 1) the function that returns a relevance score
                         --    2) the direction so the BEST match is first
    
    
    ...

    5. Make It Fast: GIN Index & Weights

    Everything so far recomputes to_tsvector() for every row on every query — fine for demos, far too slow for millions of rows. Store the vector in a column and put a GIN index on it (Generalised INverted index — the same idea as a book's back-of-book index). With setweight() you also tag where each word came from so title hits outrank body hits.

    GIN index + weighted vector

    Pre-store lexemes for instant, weighted search.

    Try it Yourself »
    SQL
    -- Speed: by default Postgres recomputes to_tsvector() for every
    -- row on every query. A GIN index pre-stores the lexemes so the
    -- search jumps straight to matching rows — instant on millions.
    
    -- Best practice (Postgres 12+): a STORED generated column,
    -- so the tsvector is always in sync with the text automatically.
    ALTER TABLE articles
        ADD COLUMN search tsvector
        GENERATED ALWAYS AS (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector
    ...

    6. MySQL: MATCH() AGAINST()

    MySQL takes a different route. You add a FULLTEXT index over the text columns, then search with MATCH(columns) AGAINST('terms'). Natural-language mode (the default) ranks by relevance automatically; boolean mode unlocks +/-/"phrase" operators much like Postgres' tsquery.

    MySQL MATCH … AGAINST

    FULLTEXT index, natural-language & boolean mode.

    Try it Yourself »
    SQL
    -- MySQL does full-text search with a FULLTEXT index and
    -- the MATCH(...) AGAINST(...) pair instead of tsvector/tsquery.
    
    -- 1) Add a FULLTEXT index over the columns you'll search:
    CREATE FULLTEXT INDEX idx_articles_ft ON articles (title, body);
    
    -- 2) Natural-language mode (the default): MySQL ranks by relevance.
    --    Putting MATCH in the SELECT returns that relevance score.
    SELECT id, title,
           MATCH(title, body) AGAINST('database performance') AS score
    FROM articles
    WHERE MATCH(title, b
    ...

    7. Fuzzy Matching for Typos

    Full-text search needs the right word (after stemming). But users misspell things — "databse", "Samsnug". Fuzzy matching measures how similar two strings are so a near-miss still matches. Two common tools, both high-level here: trigram similarity (pg_trgm) and Levenshtein distance (fuzzystrmatch).

    Trigram similarity & Levenshtein

    Typo-tolerant matching at a high level.

    Try it Yourself »
    SQL
    -- Full-text search needs the RIGHT word (after stemming).
    -- For typos and misspellings you need FUZZY matching.
    
    -- (a) Trigram similarity — pg_trgm. It compares the 3-letter
    --     chunks two strings share and scores them from 0 to 1.
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    
    SELECT similarity('database', 'databse');   -- ~0.55  (typo still close)
    SELECT title FROM articles
    WHERE similarity(title, 'databse') > 0.3     -- catches the misspelling
    ORDER BY similarity(title, 'databse') DESC;
    -- In
    ...

    Common Errors (and the fix)

    • No index, so search crawls: querying to_tsvector(body) @@ … with no GIN index (Postgres) or no FULLTEXT index (MySQL) silently does a full table scan. It "works" on 100 rows and times out on a million — add the index before you ship.
    • Using LIKE for real search: LIKE '%term%' can't use a normal index, can't stem, and can't rank. Reach for it only for tiny tables or exact substring checks, never for a search box.
    • Ignoring stop-words: searching for 'the' & 'of' returns nothing — those words were stripped from the vector. Search on the meaningful words, and remember a search for databases matches database thanks to stemming.
    • Forgetting to rank: filtering with @@ but no ts_rank + ORDER BY returns matches in arbitrary order. Always score and sort, or your "search" lists the worst hit first.
    • Wrong tsquery for user input: feeding a raw phrase to to_tsquery (to_tsquery('english', 'database performance')) errors on the space — use plainto_tsquery for plain user text.

    📘 Quick Reference

    SyntaxPurpose
    to_tsvector('english', t)Text → searchable lexemes (stemmed, stop-words removed)
    to_tsquery('english', 'a & b')Build a search with operators (& | ! <->)
    plainto_tsquery('english', t)Build a search from plain user text (ANDs words)
    doc @@ queryTRUE if the document matches the query
    ts_rank(doc, query)Relevance score — ORDER BY it, DESC
    setweight(v, 'A')Tag lexemes A…D so title outranks body
    USING gin (search)Index a tsvector column for fast search
    MATCH(cols) AGAINST('t')MySQL full-text search (needs a FULLTEXT index)
    similarity(a, b)pg_trgm trigram score 0–1 for typo tolerance

    Frequently Asked Questions

    Q: When is LIKE still fine?

    For exact substring checks on small tables, or a prefix match like LIKE 'abc%' (no leading wildcard), which can use a normal index. For a real search box over lots of text, use full-text search.

    Q: What's the difference between to_tsquery and plainto_tsquery?

    to_tsquery expects operator syntax ('database & !mysql') and errors on a bare space. plainto_tsquery takes free text from a user and ANDs the words for you — safer for search-box input.

    Q: Do I need a GIN index to use full-text search?

    No — it works without one, but every query rescans the table. Once your data grows, a GIN index (Postgres) or FULLTEXT index (MySQL) is what makes search fast.

    Q: Full-text search vs. fuzzy matching — which do I want?

    Both. Full-text finds documents about a topic (with stemming and ranking); fuzzy matching (pg_trgm, Levenshtein) catches typos. Production search usually layers full-text first with a fuzzy fallback.

    Mini-Challenge: A Ranked Search, End to End

    Put it all together — a brief, a blank canvas, and the expected result in the comments. Write it, then copy it into a PostgreSQL playground to confirm.

    🎯 Mini-Challenge

    Search 'tuning' OR 'speed', rank with ts_rank, top 5.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE — a ranked search, end to end
    -- Using ONLY what this lesson covered:
    --   1. Search the articles table for the words "tuning" OR "speed"
    --      (build the query with to_tsquery and the | operator).
    --   2. Compute a relevance column called rank with ts_rank.
    --   3. Return id, title, rank — most relevant FIRST — top 5 only.
    --
    -- ✅ Expected: at most 5 rows, ordered by rank descending,
    --    with the best-matching article at the top.
    
    -- your query here

    🎉 Lesson Complete

    • LIKE '%word%' can't scale, stem, or rank — full-text search can
    • to_tsvector + to_tsquery matched with @@ (with stemming & stop-words)
    • ts_rank scores relevance so you can ORDER BY rank DESC
    • ✅ A GIN index on a weighted tsvector column makes it fast
    • ✅ MySQL does the same with MATCH() AGAINST() and a FULLTEXT index
    • ✅ Fuzzy tools (pg_trgm, Levenshtein) tolerate typos
    • Next: storing and querying JSON & semi-structured data

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service