Full-Text Search & Fuzzy Matching
Build powerful search with tsvector/tsquery, GIN indexes, ranking, highlighting, and typo-tolerant matching.
๐ฏ What You'll Learn
- PostgreSQL tsvector/tsquery for full-text search
- GIN indexes and weighted search vectors
- Ranking results with ts_rank and highlighting with ts_headline
- Fuzzy matching with trigrams, Levenshtein, and Soundex
- MySQL MATCH AGAINST and cross-engine comparison
๐ PostgreSQL Full-Text Search
LIKE and ILIKE are brute-force searches โ they scan every character of every row. Full-text search is like having a librarian who's already indexed every word in every book. Ask for "database optimization" and she instantly pulls the right books.
tsvector & tsquery
Core full-text search types and operators
-- PostgreSQL Full-Text Search
-- Two core types: tsvector (document) and tsquery (search)
-- Convert text to tsvector (searchable tokens):
SELECT to_tsvector('english',
'PostgreSQL is a powerful open-source database system');
-- Result: 'databas':6 'open':4 'open-sourc':4 'postgresql':1
-- 'power':3 'sourc':5 'system':7
-- Notice: words are stemmed (database โ databas)
-- Stop words removed (is, a)
-- Create a search query:
SELECT to_tsquery('english', 'database & powerful');
-- R
...โก Indexing for Speed
Without an index, full-text search computes tsvectors on every row for every query. A GIN index pre-computes and stores the tokens, making searches instant even on millions of documents.
๐ก Pro Tip โ Weighted Vectors
Use setweight() with weights A (highest) through D (lowest) to prioritize title matches over body matches. A title hit for "SQL" should rank higher than a passing mention in paragraph 50.
GIN Indexes & Weighted Search
Fast full-text search with auto-updating vectors
-- Speed up full-text search with GIN indexes
-- Option 1: Index a computed tsvector
CREATE INDEX idx_articles_fts
ON articles USING gin (
to_tsvector('english', title || ' ' || content)
);
-- Option 2: Store tsvector in a dedicated column (faster)
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', content), 'B');
-- Weight A = title (highest priority)
-- Weig
...๐ Ranking & Highlighting
Finding matches is only half the battle โ you also need to rank them by relevance and show users why each result matched (highlighted snippets).
Ranking & Highlighting
ts_rank, ts_rank_cd, and ts_headline
-- Ranking and highlighting search results
-- ts_rank: relevance score based on term frequency
SELECT title,
ts_rank(search_vector, query) AS rank,
ts_rank_cd(search_vector, query) AS rank_cd
FROM articles,
to_tsquery('english', 'postgresql & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- ts_rank_cd considers cover density (proximity of terms)
-- Use it when term proximity matters
-- ts_headline: highlight matching terms in results
SELECT title,
...๐ฎ Fuzzy Matching
Users make typos. "Samsnug Galaxy" should still find "Samsung Galaxy." Trigram similarity, Levenshtein distance, and phonetic matching handle this gracefully.
โ ๏ธ Common Mistake
Using only full-text search without a fuzzy fallback. If the user misspells a word badly enough, stemming won't help. Combine tsvector @@ tsquery with a similarity() fallback for the best experience.
Fuzzy Matching
Trigrams, Levenshtein, and phonetic search
-- Fuzzy matching: find results despite typos/misspellings
-- 1. pg_trgm extension (trigram similarity)
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Similarity score (0 to 1):
SELECT similarity('PostgreSQL', 'Postgre SQL'); -- ~0.7
SELECT similarity('PostgreSQL', 'MySQL'); -- ~0.1
-- Find similar product names:
SELECT name, similarity(name, 'Samsnug Galaxy') AS sim
FROM products
WHERE similarity(name, 'Samsnug Galaxy') > 0.3
ORDER BY sim DESC;
-- Finds "Samsung Galaxy" despite the typo
...๐ฌ MySQL Full-Text Search
MySQL uses MATCH AGAINST instead of tsvector/tsquery. It's simpler but less flexible than PostgreSQL's approach.
MySQL MATCH AGAINST
MySQL full-text search and cross-engine comparison
-- MySQL Full-Text Search (MATCH AGAINST)
-- Create a FULLTEXT index:
-- CREATE FULLTEXT INDEX idx_articles_ft
-- ON articles (title, content);
-- Natural Language Mode (default):
-- SELECT title, content,
-- MATCH(title, content) AGAINST('database performance') AS score
-- FROM articles
-- WHERE MATCH(title, content) AGAINST('database performance')
-- ORDER BY score DESC;
-- Boolean Mode (AND, OR, NOT, phrases):
-- SELECT title FROM articles
-- WHERE MATCH(title, content) AGAINST(
--
...๐ Quick Reference
| Technique | Best For |
|---|---|
| tsvector + GIN | Fast document search with stemming |
| pg_trgm | Typo-tolerant similarity search |
| Levenshtein | Edit distance between strings |
| Soundex | Phonetic matching (names) |
| setweight() | Prioritize title over body matches |
๐ Lesson Complete!
You can now implement powerful search with full-text, fuzzy, and phonetic matching. Next, explore JSON and semi-structured data handling!
Sign up for free to track which lessons you've completed and get learning reminders.