Back

    Full-Text Search & Fuzzy Matching

    Build powerful search with tsvector/tsquery, GIN indexes, ranking, highlighting, and typo-tolerant matching.

    ๐ŸŽฏ What You'll Learn

    • PostgreSQL tsvector/tsquery for full-text search
    • GIN indexes and weighted search vectors
    • Ranking results with ts_rank and highlighting with ts_headline
    • Fuzzy matching with trigrams, Levenshtein, and Soundex
    • MySQL MATCH AGAINST and cross-engine comparison

    ๐Ÿ” PostgreSQL Full-Text Search

    LIKE and ILIKE are brute-force searches โ€” they scan every character of every row. Full-text search is like having a librarian who's already indexed every word in every book. Ask for "database optimization" and she instantly pulls the right books.

    tsvector & tsquery

    Core full-text search types and operators

    Try it Yourself ยป
    SQL
    -- PostgreSQL Full-Text Search
    -- Two core types: tsvector (document) and tsquery (search)
    
    -- Convert text to tsvector (searchable tokens):
    SELECT to_tsvector('english',
        'PostgreSQL is a powerful open-source database system');
    -- Result: 'databas':6 'open':4 'open-sourc':4 'postgresql':1
    --         'power':3 'sourc':5 'system':7
    -- Notice: words are stemmed (database โ†’ databas)
    -- Stop words removed (is, a)
    
    -- Create a search query:
    SELECT to_tsquery('english', 'database & powerful');
    -- R
    ...

    โšก Indexing for Speed

    Without an index, full-text search computes tsvectors on every row for every query. A GIN index pre-computes and stores the tokens, making searches instant even on millions of documents.

    ๐Ÿ’ก Pro Tip โ€” Weighted Vectors

    Use setweight() with weights A (highest) through D (lowest) to prioritize title matches over body matches. A title hit for "SQL" should rank higher than a passing mention in paragraph 50.

    ๐Ÿ† Ranking & Highlighting

    Finding matches is only half the battle โ€” you also need to rank them by relevance and show users why each result matched (highlighted snippets).

    Ranking & Highlighting

    ts_rank, ts_rank_cd, and ts_headline

    Try it Yourself ยป
    SQL
    -- Ranking and highlighting search results
    
    -- ts_rank: relevance score based on term frequency
    SELECT title,
           ts_rank(search_vector, query) AS rank,
           ts_rank_cd(search_vector, query) AS rank_cd
    FROM articles,
         to_tsquery('english', 'postgresql & performance') AS query
    WHERE search_vector @@ query
    ORDER BY rank DESC;
    
    -- ts_rank_cd considers cover density (proximity of terms)
    -- Use it when term proximity matters
    
    -- ts_headline: highlight matching terms in results
    SELECT title,
    
    ...

    ๐Ÿ”ฎ Fuzzy Matching

    Users make typos. "Samsnug Galaxy" should still find "Samsung Galaxy." Trigram similarity, Levenshtein distance, and phonetic matching handle this gracefully.

    โš ๏ธ Common Mistake

    Using only full-text search without a fuzzy fallback. If the user misspells a word badly enough, stemming won't help. Combine tsvector @@ tsquery with a similarity() fallback for the best experience.

    Fuzzy Matching

    Trigrams, Levenshtein, and phonetic search

    Try it Yourself ยป
    SQL
    -- Fuzzy matching: find results despite typos/misspellings
    
    -- 1. pg_trgm extension (trigram similarity)
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    
    -- Similarity score (0 to 1):
    SELECT similarity('PostgreSQL', 'Postgre SQL');  -- ~0.7
    SELECT similarity('PostgreSQL', 'MySQL');         -- ~0.1
    
    -- Find similar product names:
    SELECT name, similarity(name, 'Samsnug Galaxy') AS sim
    FROM products
    WHERE similarity(name, 'Samsnug Galaxy') > 0.3
    ORDER BY sim DESC;
    -- Finds "Samsung Galaxy" despite the typo
    ...

    ๐Ÿฌ MySQL Full-Text Search

    MySQL uses MATCH AGAINST instead of tsvector/tsquery. It's simpler but less flexible than PostgreSQL's approach.

    MySQL MATCH AGAINST

    MySQL full-text search and cross-engine comparison

    Try it Yourself ยป
    SQL
    -- MySQL Full-Text Search (MATCH AGAINST)
    
    -- Create a FULLTEXT index:
    -- CREATE FULLTEXT INDEX idx_articles_ft
    -- ON articles (title, content);
    
    -- Natural Language Mode (default):
    -- SELECT title, content,
    --        MATCH(title, content) AGAINST('database performance') AS score
    -- FROM articles
    -- WHERE MATCH(title, content) AGAINST('database performance')
    -- ORDER BY score DESC;
    
    -- Boolean Mode (AND, OR, NOT, phrases):
    -- SELECT title FROM articles
    -- WHERE MATCH(title, content) AGAINST(
    -- 
    ...

    ๐Ÿ“‹ Quick Reference

    TechniqueBest For
    tsvector + GINFast document search with stemming
    pg_trgmTypo-tolerant similarity search
    LevenshteinEdit distance between strings
    SoundexPhonetic matching (names)
    setweight()Prioritize title over body matches

    ๐ŸŽ‰ Lesson Complete!

    You can now implement powerful search with full-text, fuzzy, and phonetic matching. Next, explore JSON and semi-structured data handling!

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy Policy โ€ข Terms of Service