Indexing Internals: B-Trees, Hash & Bitmap Indexes
Understand how different index structures work under the hood and when to use each one.
π― What You'll Learn
- How B-Tree indexes store and search data in O(log n) time
- When Hash indexes beat B-Trees (and when they don't)
- Bitmap indexes for low-cardinality columns
- GIN and GiST indexes for full-text, JSON, and spatial queries
- A decision framework for choosing the right index type
π³ B-Tree Indexes β The Workhorse
Think of a B-Tree like a library card catalog. Instead of scanning every book, you flip to the right drawer (branch), then the right card (leaf). Most databases default to B-Tree indexes because they handle equality, ranges, sorting, and prefix searches efficiently.
How it works: Data is stored in sorted leaf nodes connected in a doubly-linked list. Internal nodes act as signposts directing the search. A table with 1 million rows needs only about 20 comparisons (logβ 1,000,000 β 20) to find any value.
π‘ Pro Tip β Composite Index Prefix Rule
A composite index on (A, B, C) can serve queries filtering on A, A+B, or A+B+C β but NOT B alone or C alone. Always put the most-filtered column first.
B-Tree Index Basics
Create B-Tree indexes and understand the leftmost prefix rule
-- B-Tree Index (default in most databases)
-- Best for: equality, range, sorting, prefix LIKE
CREATE INDEX idx_employees_salary
ON employees (salary);
-- The B-Tree stores values in sorted order
-- Lookup: O(log n) β like a phonebook binary search
SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 80000;
-- The index finds the starting leaf, then scans
-- sequentially until salary > 80000
-- Composite B-Tree: leftmost prefix rule applies
CREATE INDEX idx_emp_dept_hire
ON employees (depart
...#οΈβ£ Hash Indexes β Lightning-Fast Equality
A Hash index is like a dictionary with perfect alphabetical tabs β you jump straight to the exact page. It computes a hash of the value and looks up the corresponding bucket in O(1) average time. The trade-off: it can only do exact equality. No ranges, no sorting, no LIKE.
β οΈ Common Mistake
Creating a Hash index on a column you later need to sort or range-filter. In practice, B-Tree handles equality almost as fast AND supports everything else. Use Hash only when benchmarks prove a measurable improvement.
Hash Index Example
Understand when hash indexes outperform B-Trees
-- Hash Index (PostgreSQL, MySQL Memory engine)
-- Best for: exact equality lookups ONLY
-- Cannot do: range scans, sorting, LIKE
-- PostgreSQL hash index
CREATE INDEX idx_users_email_hash
ON users USING hash (email);
-- β
Perfect for equality
SELECT * FROM users WHERE email = 'alice@example.com';
-- O(1) average lookup β faster than B-Tree for equality
-- β Hash indexes CANNOT do this:
-- WHERE email LIKE 'alice%' -- no prefix search
-- ORDER BY email -- no sorting
-- WHER
...πΊοΈ Bitmap Indexes β Low-Cardinality Power
Imagine you have a warehouse of boxes labeled only "Red", "Blue", or "Green". A bitmap index creates a bit-string for each color: Red = 110010β¦, Blue = 001100β¦. To find "Red AND Blue", just AND the bitmaps β extremely fast for combining multiple low-cardinality filters.
Bitmap Index & Scans
See how PostgreSQL combines indexes with bitmap scans
-- Bitmap Index (Oracle, PostgreSQL uses on-the-fly)
-- Best for: low-cardinality columns (few distinct values)
-- e.g., status, gender, boolean flags
-- Oracle syntax (explicit bitmap index)
-- CREATE BITMAP INDEX idx_orders_status
-- ON orders (status);
-- PostgreSQL creates bitmap scans automatically
-- when combining multiple indexes:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'shipped'
AND region = 'west'
AND priority = 'high';
-- PostgreSQL may do:
-- Bitmap Index Scan on id
...π GIN & GiST β Specialized Indexes
GIN (Generalized Inverted Index) is like the index at the back of a textbook β it maps each word to every page it appears on. Perfect for full-text search, JSONB containment, and array overlap queries.
GiST (Generalized Search Tree) handles geometric and range data. Think of it as a spatial filing system that can answer "find the nearest" queries efficiently.
GIN & GiST Indexes
Full-text search, JSONB, and spatial indexes
-- GIN Index (Generalized Inverted Index)
-- Best for: full-text search, arrays, JSONB
CREATE INDEX idx_articles_search
ON articles USING gin (to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('database & performance');
-- GIN for JSONB containment
CREATE INDEX idx_events_data
ON events USING gin (metadata jsonb_path_ops);
SELECT * FROM events
WHERE metadata @> '{"type": "purchase", "region": "EU"}';
-- GiST Index (Generalized Search
...π§ Choosing the Right Index
Don't index blindly β every index slows writes and consumes storage. Follow this decision framework:
Index Decision Framework
Step-by-step guide to choosing and validating indexes
-- Decision framework: which index type to use?
-- Step 1: What queries hit this column?
-- Equality only β Hash (or B-Tree)
-- Equality + Range β B-Tree
-- Full-text search β GIN
-- Geometric/Spatial β GiST
-- Low cardinality β Bitmap (Oracle) / let DB decide
-- Step 2: Check if an index helps at all
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
-- Look for "Seq Scan" β needs index
-- Look for "Index Scan" β already indexed
-- Step 3: Measure the impact
-- Be
...π Quick Reference
| Index Type | Best For | Limitation |
|---|---|---|
| B-Tree | Equality, range, sort | General-purpose (rarely wrong) |
| Hash | Exact equality only | No range, sort, or LIKE |
| Bitmap | Low-cardinality + AND/OR | High write overhead |
| GIN | Full-text, JSONB, arrays | Slow to build, large size |
| GiST | Spatial, range, nearest | Lossy β needs recheck |
π Lesson Complete!
You now understand the internals of B-Tree, Hash, Bitmap, GIN, and GiST indexes. Next, learn how to read execution plans and optimize queries!
Sign up for free to track which lessons you've completed and get learning reminders.