Skip to main content
    Back

    Advanced Track • Performance

    Buffer Pool, Caches & Memory

    By the end of this lesson you'll understand why the first run of a query is slow and the second is instant — and you'll be able to measure your database's cache hit ratio, reason about how to size the buffer pool, and tell apart the memory knobs that actually matter (shared_buffers / innodb_buffer_pool_size, work_mem, effective_cache_size).

    What You'll Learn

    • How databases cache data in pages inside the buffer pool
    • What the cache hit ratio means and how to measure it
    • Why the first query is slow and the second is fast
    • Dirty pages, WAL, and checkpoints in plain English
    • How eviction works (LRU) when memory fills up
    • How to size the buffer pool — and what work_mem really does

    Real-world analogy: the chef's countertop

    Picture a busy kitchen. The walk-in fridge is the disk: it holds everything, but every trip there is slow. The countertop is the buffer pool — a small, fast space right next to the chef where the most-used ingredients sit ready to grab.

    The first time the chef needs eggs, they walk to the fridge (a slow disk read) and bring them to the counter. Every time after that, the eggs are already on the counter — instant (a cache hit). When the counter fills up, the chef clears off whatever hasn't been touched in a while to make room (eviction). And if they crack and whisk some eggs, that bowl is "dirty" — it must eventually be written back so the fridge's copy stays correct (a dirty page flushed at a checkpoint).

    1. Pages and the Buffer Pool

    Databases don't read your table one row at a time. They read it in fixed-size chunks called pages (8 KB in PostgreSQL, 16 KB by default in MySQL/InnoDB). A page is the smallest unit of data the engine moves between disk and memory.

    The buffer pool is a big slab of RAM where the database keeps recently-used pages so it doesn't have to go back to disk. In PostgreSQL it's called shared_buffers; in MySQL/InnoDB it's the buffer pool, sized by innodb_buffer_pool_size. Same idea, different name.

    When a query needs a page, the engine first checks the pool. If the page is there, that's a cache hit (fast, no disk). If not, it's a miss — the page is read from disk and copied into the pool so the next read is a hit. That's the whole secret behind "first query slow, second query fast".

    Why the 2nd run is faster

    The same query: cold cache (disk reads) vs warm cache (memory hits).

    Try it Yourself »
    SQL
    -- Run this twice in a row and watch the timing change.
    -- The numbers below are typical for a few-million-row table.
    
    EXPLAIN (ANALYZE, BUFFERS)
    SELECT * FROM orders WHERE customer_id = 42;
    
    -- 1st run (cold cache — pages are on disk):
    --   Buffers: shared read=1843        ← 1843 pages fetched FROM DISK
    --   Execution Time: 184.512 ms       ← slow
    
    -- 2nd run (warm cache — pages now live in the buffer pool):
    --   Buffers: shared hit=1843         ← 1843 pages served FROM MEMORY
    --   Execution Ti
    ...

    2. The Cache Hit Ratio

    The single most useful number for memory health is the cache hit ratio: of all the page lookups your database did, what fraction were already in memory? The formula is simply hits / (hits + reads), where reads means "had to go to disk".

    PostgreSQL tracks these counters per table in the system view pg_statio_user_tablesheap_blks_hit for pages found in the pool and heap_blks_read for pages read from disk. Sum them up and you get your whole-database ratio.

    Measure the hit ratio (PostgreSQL)

    Compute hits / (hits + reads) from pg_statio_user_tables.

    Try it Yourself »
    SQL
    -- Cache hit ratio = how often a page was already in memory
    -- instead of having to be read from disk. Higher is better.
    
    -- PostgreSQL — per-table I/O counters live in pg_statio_user_tables:
    SELECT
        SUM(heap_blks_hit)   AS cache_hits,   -- pages served from the buffer pool
        SUM(heap_blks_read)  AS disk_reads,   -- pages that had to be read from disk
        ROUND(
            SUM(heap_blks_hit)::decimal
            / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0)
            * 100, 2
        ) AS hit_ra
    ...

    Result — 1 row:

    cache_hitsdisk_readshit_ratio_pct
    49500005000099

    Read that result like a doctor reads a pulse: 99%+ is healthy for a transactional database. A ratio that's drifting down toward 90% usually means your working set — the slice of data your queries actually touch — no longer fits in the buffer pool, so the engine keeps fetching pages from disk.

    Your Turn: compute a hit ratio

    Fill in the blanks so the query divides hits by total lookups. The expected answer is in the comments so you can check yourself.

    🎯 Your Turn: hit ratio formula

    Put heap_blks_hit and heap_blks_read in the right blanks.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself".
    -- Goal: compute the cache hit ratio from these two counters.
    --   heap_blks_hit  = 4,950,000   (pages found in memory)
    --   heap_blks_read =    50,000   (pages read from disk)
    
    SELECT
        ROUND(
            ___::decimal                       -- 👉 the HITS counter goes here
            / (___  + ___)                     -- 👉 hits + reads = total page lookups
            * 100, 2
        ) AS hit_ratio_pct
    FROM pg_statio_user_tables;
    
    -- ✅
    ...

    3. The Same Idea in MySQL/InnoDB

    MySQL's InnoDB engine has the exact same mechanism under a different name. The pool is sized by innodb_buffer_pool_size, and on a dedicated database server the convention is to give it roughly 70–80% of total RAM (higher than PostgreSQL's 25%, because InnoDB does its own caching rather than leaning on the operating system).

    InnoDB reports two counters: Innodb_buffer_pool_read_requests (every page lookup) and Innodb_buffer_pool_reads (the lookups that missed and went to disk). The hit ratio is 1 − reads / read_requests.

    Buffer pool in MySQL/InnoDB

    innodb_buffer_pool_size and the Performance Schema counters.

    Try it Yourself »
    SQL
    -- MySQL / InnoDB calls the same thing the BUFFER POOL,
    -- sized by innodb_buffer_pool_size.
    
    SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
    -- Default is small (often 128MB) — too low for production.
    -- Rule of thumb on a dedicated DB server: ~70-80% of total RAM.
    
    -- InnoDB exposes hit/miss counters in a status view:
    SELECT
        VARIABLE_NAME, VARIABLE_VALUE
    FROM performance_schema.global_status
    WHERE VARIABLE_NAME IN (
        'Innodb_buffer_pool_read_requests',  -- logical reads (RAM lookups)
       
    ...

    Result — 2 rows:

    VARIABLE_NAMEVARIABLE_VALUE
    Innodb_buffer_pool_read_requests50000000
    Innodb_buffer_pool_reads12000

    4. Dirty Pages, WAL & Checkpoints

    Reads are only half the story. When you UPDATE a row, the engine changes the page in the buffer pool first, not on disk. That modified-in-memory page is now a dirty page: the version in RAM is newer than the version on disk.

    So how is your change safe if the server loses power before the dirty page reaches disk? The WAL (Write-Ahead Log — InnoDB calls it the redo log) is the answer. Before the change is acknowledged, a tiny record of it is appended to the WAL, a sequential file that's extremely fast to write. If the server crashes, the database replays the WAL on restart to reconstruct any dirty pages that never got flushed. The rule is in the name: write the log ahead of the data.

    A checkpoint is the periodic, batched job that flushes dirty pages from the pool down to the data files and marks that portion of the WAL as no longer needed. Checkpoints are spread out on purpose (PostgreSQL's checkpoint_completion_target) so the disk isn't slammed with a write storm all at once.

    5. Eviction — What Happens When Memory Fills Up

    The buffer pool is finite, so eventually it fills. To load a new page, the engine must evict an old one. Both PostgreSQL and InnoDB use a variant of LRULeast Recently Used — meaning the page that hasn't been touched for the longest time is the first to go.

    This is why your hottest data tends to stay resident: every time a page is read, it's marked as recently used, so popular pages survive while one-off scans age out. It's also why a giant SELECT * over a cold table can briefly hurt — it can flood the pool with pages you'll never reuse, evicting genuinely hot ones (InnoDB's "midpoint insertion" LRU is specifically designed to resist exactly this).

    Back to the kitchen: the chef keeps salt and oil on the counter all shift because they're touched constantly, but that exotic spice used once at lunch gets cleared off to make room. LRU is just "clear off whatever you haven't reached for in the longest time".

    6. The Memory Knobs That Matter

    Beginners often think every memory setting is "the cache". It isn't. There are three knobs worth knowing, and confusing them is the most common tuning mistake:

    • shared_buffers (PostgreSQL) / innodb_buffer_pool_size (MySQL) — the actual cache. One shared pool of data pages for the whole server. This is what your hit ratio depends on.
    • work_mem — memory for one sort or hash operation, allocated per operation, per connection. When a sort doesn't fit in work_mem it spills to disk (slow). The trap: it is not the cache and it is not per-server — 100 connections each doing 2 sorts at 256 MB can demand 50 GB.
    • effective_cache_sizea hint, not an allocation. It allocates zero memory. It just tells the query planner roughly how much cache (buffer pool + OS file cache) probably exists, which nudges it toward index scans. Set it high (~75% of RAM) without fear.

    Your Turn: size the buffer pool

    No SQL to run here — just reason about the numbers. Fill in the two blanks using the 25% / 75% conventions for a 64 GB box.

    🎯 Your Turn: sizing a 64GB server

    Work out shared_buffers and effective_cache_size.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — fill in the blanks to SIZE a buffer pool.
    -- Scenario: a dedicated PostgreSQL box with 64 GB of RAM.
    -- The convention: shared_buffers ≈ 25% of RAM, and
    -- effective_cache_size ≈ 75% of RAM (a HINT, not an allocation).
    
    -- shared_buffers  = ___ GB    -- 👉 25% of 64 GB
    -- effective_cache_size = ___ GB    -- 👉 75% of 64 GB
    
    -- ✅ Expected: shared_buffers = 16GB, effective_cache_size = 48GB.
    --    shared_buffers actually reserves RAM; effective_cache_size only
    --    TELLS the pla
    ...

    Common Mistakes (and the fix)

    • Buffer pool too small: a low hit ratio (sliding under ~95%) and constant disk reads mean your working set doesn't fit. Increase shared_buffers / innodb_buffer_pool_size — or add RAM.
    • Buffer pool too large (PostgreSQL): pushing shared_buffers past ~40% of RAM starves the OS file cache, which PostgreSQL also relies on. You can end up double-caching and slower. 25% is the safe default.
    • Ignoring warm-up: right after a restart the pool is empty, so every query is "first-run slow". Don't benchmark a cold server — warm it (or use pg_prewarm) before judging performance.
    • Confusing shared_buffers with work_mem: raising work_mem does nothing for the cache hit ratio, and setting it high globally can exhaust RAM across many connections. It's per-operation, not per-server.
    • Low hit ratio "fixed" by effective_cache_size: it allocates no memory and can't change your hit ratio — it only influences planner choices. Don't expect it to cache anything.

    📘 Quick Reference

    Term / SettingWhat it is
    PageFixed chunk read/written as a unit (8KB PG, 16KB InnoDB)
    Buffer poolRAM cache of data pages (shared_buffers / innodb_buffer_pool_size)
    Cache hitPage found in memory — fast, no disk
    Hit ratiohits / (hits + reads); aim for 99%+
    Dirty pageModified in memory, not yet written to disk
    WAL / redo logSequential log written before data, for crash recovery
    CheckpointBatched flush of dirty pages to the data files
    LRU evictionDrops the least-recently-used page to free space
    work_memPer-operation sort/hash memory (NOT the cache)
    effective_cache_sizePlanner hint about total cache (allocates nothing)

    Frequently Asked Questions

    Q: Why is the first run of a query slow but the second is fast?

    The first run reads pages from disk (a cache miss) and copies them into the buffer pool. The second run finds those same pages already in RAM (cache hits), so it skips the slow disk I/O entirely — often 50–100x faster.

    Q: Should I just make the buffer pool as big as possible?

    No. In PostgreSQL, going past ~40% of RAM starves the OS file cache it also depends on, so ~25% is the sweet spot. InnoDB caches more itself, so 70–80% is normal there. Either way, leave room for the OS, connections, and per-query work_mem.

    Q: Is effective_cache_size the same as shared_buffers?

    No — and this trips up almost everyone. shared_buffers actually reserves RAM to cache pages. effective_cache_size reserves nothing; it's a number that tells the planner how much cache likely exists so it favours index scans. Setting it high is free.

    Q: My hit ratio is great but writes feel slow. Why?

    The buffer pool speeds up reads; writes are bounded by the WAL and checkpoints. If checkpoints bunch up you get write spikes — spread them out (raise checkpoint_completion_target, tune max_wal_size) so dirty pages flush gradually rather than all at once.

    Mini-Challenge: Find the Cold Tables

    Put it all together — a brief, a blank canvas, and the expected shape of the result in the comments. Write it, then run it against a real Postgres instance to confirm.

    🎯 Mini-Challenge

    Per-table hit ratio, worst-first, from pg_statio_user_tables.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE
    -- Using ONLY what this lesson covered (pg_statio counters + the
    -- hit-ratio formula), write ONE query that reports, per table:
    --   1. the table name (relname)
    --   2. heap_blks_hit and heap_blks_read
    --   3. a hit_ratio_pct column, rounded to 2 decimals
    --   4. ordered so the WORST hit ratio is at the top (find cold tables)
    --
    -- ✅ Expected: one row per user table, e.g.
    --    orders   | 4900000 | 100000 | 98.00
    --    products |  500000 |    250  | 99.95
    --    ...with the 
    ...

    🎉 Lesson Complete

    • ✅ Databases cache data in fixed-size pages inside the buffer pool
    • ✅ The cache hit ratio is hits / (hits + reads) — aim for 99%+
    • ✅ "First slow, second fast" = a disk miss warming the pool for the next hit
    • Dirty pages are protected by the WAL and flushed at checkpoints
    • LRU eviction drops the least-recently-used page when memory fills
    • shared_buffers is the cache; work_mem is per-sort; effective_cache_size is only a hint
    • Next: Massive Data Handling — bulk inserts and ETL at scale

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service