Back

    Indexing Internals: B-Trees, Hash & Bitmap Indexes

    Understand how different index structures work under the hood and when to use each one.

    🎯 What You'll Learn

    • How B-Tree indexes store and search data in O(log n) time
    • When Hash indexes beat B-Trees (and when they don't)
    • Bitmap indexes for low-cardinality columns
    • GIN and GiST indexes for full-text, JSON, and spatial queries
    • A decision framework for choosing the right index type

    🌳 B-Tree Indexes β€” The Workhorse

    Think of a B-Tree like a library card catalog. Instead of scanning every book, you flip to the right drawer (branch), then the right card (leaf). Most databases default to B-Tree indexes because they handle equality, ranges, sorting, and prefix searches efficiently.

    How it works: Data is stored in sorted leaf nodes connected in a doubly-linked list. Internal nodes act as signposts directing the search. A table with 1 million rows needs only about 20 comparisons (logβ‚‚ 1,000,000 β‰ˆ 20) to find any value.

    πŸ’‘ Pro Tip β€” Composite Index Prefix Rule

    A composite index on (A, B, C) can serve queries filtering on A, A+B, or A+B+C β€” but NOT B alone or C alone. Always put the most-filtered column first.

    B-Tree Index Basics

    Create B-Tree indexes and understand the leftmost prefix rule

    Try it Yourself Β»
    SQL
    -- B-Tree Index (default in most databases)
    -- Best for: equality, range, sorting, prefix LIKE
    CREATE INDEX idx_employees_salary
    ON employees (salary);
    
    -- The B-Tree stores values in sorted order
    -- Lookup: O(log n) β€” like a phonebook binary search
    SELECT * FROM employees
    WHERE salary BETWEEN 50000 AND 80000;
    -- The index finds the starting leaf, then scans
    -- sequentially until salary > 80000
    
    -- Composite B-Tree: leftmost prefix rule applies
    CREATE INDEX idx_emp_dept_hire
    ON employees (depart
    ...

    #️⃣ Hash Indexes β€” Lightning-Fast Equality

    A Hash index is like a dictionary with perfect alphabetical tabs β€” you jump straight to the exact page. It computes a hash of the value and looks up the corresponding bucket in O(1) average time. The trade-off: it can only do exact equality. No ranges, no sorting, no LIKE.

    ⚠️ Common Mistake

    Creating a Hash index on a column you later need to sort or range-filter. In practice, B-Tree handles equality almost as fast AND supports everything else. Use Hash only when benchmarks prove a measurable improvement.

    Hash Index Example

    Understand when hash indexes outperform B-Trees

    Try it Yourself Β»
    SQL
    -- Hash Index (PostgreSQL, MySQL Memory engine)
    -- Best for: exact equality lookups ONLY
    -- Cannot do: range scans, sorting, LIKE
    
    -- PostgreSQL hash index
    CREATE INDEX idx_users_email_hash
    ON users USING hash (email);
    
    -- βœ… Perfect for equality
    SELECT * FROM users WHERE email = 'alice@example.com';
    -- O(1) average lookup β€” faster than B-Tree for equality
    
    -- ❌ Hash indexes CANNOT do this:
    -- WHERE email LIKE 'alice%'     -- no prefix search
    -- ORDER BY email                -- no sorting
    -- WHER
    ...

    πŸ—ΊοΈ Bitmap Indexes β€” Low-Cardinality Power

    Imagine you have a warehouse of boxes labeled only "Red", "Blue", or "Green". A bitmap index creates a bit-string for each color: Red = 110010…, Blue = 001100…. To find "Red AND Blue", just AND the bitmaps β€” extremely fast for combining multiple low-cardinality filters.

    Bitmap Index & Scans

    See how PostgreSQL combines indexes with bitmap scans

    Try it Yourself Β»
    SQL
    -- Bitmap Index (Oracle, PostgreSQL uses on-the-fly)
    -- Best for: low-cardinality columns (few distinct values)
    -- e.g., status, gender, boolean flags
    
    -- Oracle syntax (explicit bitmap index)
    -- CREATE BITMAP INDEX idx_orders_status
    -- ON orders (status);
    
    -- PostgreSQL creates bitmap scans automatically
    -- when combining multiple indexes:
    EXPLAIN ANALYZE
    SELECT * FROM orders
    WHERE status = 'shipped'
      AND region = 'west'
      AND priority = 'high';
    -- PostgreSQL may do:
    -- Bitmap Index Scan on id
    ...

    πŸ” GIN & GiST β€” Specialized Indexes

    GIN (Generalized Inverted Index) is like the index at the back of a textbook β€” it maps each word to every page it appears on. Perfect for full-text search, JSONB containment, and array overlap queries.

    GiST (Generalized Search Tree) handles geometric and range data. Think of it as a spatial filing system that can answer "find the nearest" queries efficiently.

    GIN & GiST Indexes

    Full-text search, JSONB, and spatial indexes

    Try it Yourself Β»
    SQL
    -- GIN Index (Generalized Inverted Index)
    -- Best for: full-text search, arrays, JSONB
    CREATE INDEX idx_articles_search
    ON articles USING gin (to_tsvector('english', content));
    
    SELECT * FROM articles
    WHERE to_tsvector('english', content) @@ to_tsquery('database & performance');
    
    -- GIN for JSONB containment
    CREATE INDEX idx_events_data
    ON events USING gin (metadata jsonb_path_ops);
    
    SELECT * FROM events
    WHERE metadata @> '{"type": "purchase", "region": "EU"}';
    
    -- GiST Index (Generalized Search
    ...

    🧭 Choosing the Right Index

    Don't index blindly β€” every index slows writes and consumes storage. Follow this decision framework:

    Index Decision Framework

    Step-by-step guide to choosing and validating indexes

    Try it Yourself Β»
    SQL
    -- Decision framework: which index type to use?
    
    -- Step 1: What queries hit this column?
    -- Equality only       β†’ Hash (or B-Tree)
    -- Equality + Range    β†’ B-Tree
    -- Full-text search    β†’ GIN
    -- Geometric/Spatial   β†’ GiST
    -- Low cardinality     β†’ Bitmap (Oracle) / let DB decide
    
    -- Step 2: Check if an index helps at all
    EXPLAIN ANALYZE
    SELECT * FROM orders WHERE customer_id = 42;
    -- Look for "Seq Scan" β†’ needs index
    -- Look for "Index Scan" β†’ already indexed
    
    -- Step 3: Measure the impact
    -- Be
    ...

    πŸ“‹ Quick Reference

    Index TypeBest ForLimitation
    B-TreeEquality, range, sortGeneral-purpose (rarely wrong)
    HashExact equality onlyNo range, sort, or LIKE
    BitmapLow-cardinality + AND/ORHigh write overhead
    GINFull-text, JSONB, arraysSlow to build, large size
    GiSTSpatial, range, nearestLossy β€” needs recheck

    πŸŽ‰ Lesson Complete!

    You now understand the internals of B-Tree, Hash, Bitmap, GIN, and GiST indexes. Next, learn how to read execution plans and optimize queries!

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy Policy β€’ Terms of Service