Advanced Track • Performance
Buffer Pool, Caches & Memory
By the end of this lesson you'll understand why the first run of a query is slow and the second is instant — and you'll be able to measure your database's cache hit ratio, reason about how to size the buffer pool, and tell apart the memory knobs that actually matter (shared_buffers / innodb_buffer_pool_size, work_mem, effective_cache_size).
What You'll Learn
- ✓How databases cache data in pages inside the buffer pool
- ✓What the cache hit ratio means and how to measure it
- ✓Why the first query is slow and the second is fast
- ✓Dirty pages, WAL, and checkpoints in plain English
- ✓How eviction works (LRU) when memory fills up
- ✓How to size the buffer pool — and what work_mem really does
EXPLAIN (ANALYZE, BUFFERS) output. These queries read live server internals (pg_statio_user_tables, InnoDB status), so they need a real PostgreSQL or MySQL server — they won't run in a SQLite sandbox. Copy them into db-fiddle.com (Postgres mode) or your own instance. Every example shows the expected output so you can follow along either way.Real-world analogy: the chef's countertop
Picture a busy kitchen. The walk-in fridge is the disk: it holds everything, but every trip there is slow. The countertop is the buffer pool — a small, fast space right next to the chef where the most-used ingredients sit ready to grab.
The first time the chef needs eggs, they walk to the fridge (a slow disk read) and bring them to the counter. Every time after that, the eggs are already on the counter — instant (a cache hit). When the counter fills up, the chef clears off whatever hasn't been touched in a while to make room (eviction). And if they crack and whisk some eggs, that bowl is "dirty" — it must eventually be written back so the fridge's copy stays correct (a dirty page flushed at a checkpoint).
1. Pages and the Buffer Pool
Databases don't read your table one row at a time. They read it in fixed-size chunks called pages (8 KB in PostgreSQL, 16 KB by default in MySQL/InnoDB). A page is the smallest unit of data the engine moves between disk and memory.
The buffer pool is a big slab of RAM where the database keeps recently-used pages so it doesn't have to go back to disk. In PostgreSQL it's called shared_buffers; in MySQL/InnoDB it's the buffer pool, sized by innodb_buffer_pool_size. Same idea, different name.
When a query needs a page, the engine first checks the pool. If the page is there, that's a cache hit (fast, no disk). If not, it's a miss — the page is read from disk and copied into the pool so the next read is a hit. That's the whole secret behind "first query slow, second query fast".
Why the 2nd run is faster
The same query: cold cache (disk reads) vs warm cache (memory hits).
-- Run this twice in a row and watch the timing change.
-- The numbers below are typical for a few-million-row table.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42;
-- 1st run (cold cache — pages are on disk):
-- Buffers: shared read=1843 ← 1843 pages fetched FROM DISK
-- Execution Time: 184.512 ms ← slow
-- 2nd run (warm cache — pages now live in the buffer pool):
-- Buffers: shared hit=1843 ← 1843 pages served FROM MEMORY
-- Execution Ti
...2. The Cache Hit Ratio
The single most useful number for memory health is the cache hit ratio: of all the page lookups your database did, what fraction were already in memory? The formula is simply hits / (hits + reads), where reads means "had to go to disk".
PostgreSQL tracks these counters per table in the system view pg_statio_user_tables — heap_blks_hit for pages found in the pool and heap_blks_read for pages read from disk. Sum them up and you get your whole-database ratio.
Measure the hit ratio (PostgreSQL)
Compute hits / (hits + reads) from pg_statio_user_tables.
-- Cache hit ratio = how often a page was already in memory
-- instead of having to be read from disk. Higher is better.
-- PostgreSQL — per-table I/O counters live in pg_statio_user_tables:
SELECT
SUM(heap_blks_hit) AS cache_hits, -- pages served from the buffer pool
SUM(heap_blks_read) AS disk_reads, -- pages that had to be read from disk
ROUND(
SUM(heap_blks_hit)::decimal
/ NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0)
* 100, 2
) AS hit_ra
...Result — 1 row:
| cache_hits | disk_reads | hit_ratio_pct |
|---|---|---|
| 4950000 | 50000 | 99 |
Read that result like a doctor reads a pulse: 99%+ is healthy for a transactional database. A ratio that's drifting down toward 90% usually means your working set — the slice of data your queries actually touch — no longer fits in the buffer pool, so the engine keeps fetching pages from disk.
Your Turn: compute a hit ratio
Fill in the blanks so the query divides hits by total lookups. The expected answer is in the comments so you can check yourself.
🎯 Your Turn: hit ratio formula
Put heap_blks_hit and heap_blks_read in the right blanks.
-- 🎯 YOUR TURN — fill in the two blanks, then press "Try it Yourself".
-- Goal: compute the cache hit ratio from these two counters.
-- heap_blks_hit = 4,950,000 (pages found in memory)
-- heap_blks_read = 50,000 (pages read from disk)
SELECT
ROUND(
___::decimal -- 👉 the HITS counter goes here
/ (___ + ___) -- 👉 hits + reads = total page lookups
* 100, 2
) AS hit_ratio_pct
FROM pg_statio_user_tables;
-- ✅
...3. The Same Idea in MySQL/InnoDB
MySQL's InnoDB engine has the exact same mechanism under a different name. The pool is sized by innodb_buffer_pool_size, and on a dedicated database server the convention is to give it roughly 70–80% of total RAM (higher than PostgreSQL's 25%, because InnoDB does its own caching rather than leaning on the operating system).
InnoDB reports two counters: Innodb_buffer_pool_read_requests (every page lookup) and Innodb_buffer_pool_reads (the lookups that missed and went to disk). The hit ratio is 1 − reads / read_requests.
Buffer pool in MySQL/InnoDB
innodb_buffer_pool_size and the Performance Schema counters.
-- MySQL / InnoDB calls the same thing the BUFFER POOL,
-- sized by innodb_buffer_pool_size.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
-- Default is small (often 128MB) — too low for production.
-- Rule of thumb on a dedicated DB server: ~70-80% of total RAM.
-- InnoDB exposes hit/miss counters in a status view:
SELECT
VARIABLE_NAME, VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
'Innodb_buffer_pool_read_requests', -- logical reads (RAM lookups)
...Result — 2 rows:
| VARIABLE_NAME | VARIABLE_VALUE |
|---|---|
| Innodb_buffer_pool_read_requests | 50000000 |
| Innodb_buffer_pool_reads | 12000 |
4. Dirty Pages, WAL & Checkpoints
Reads are only half the story. When you UPDATE a row, the engine changes the page in the buffer pool first, not on disk. That modified-in-memory page is now a dirty page: the version in RAM is newer than the version on disk.
So how is your change safe if the server loses power before the dirty page reaches disk? The WAL (Write-Ahead Log — InnoDB calls it the redo log) is the answer. Before the change is acknowledged, a tiny record of it is appended to the WAL, a sequential file that's extremely fast to write. If the server crashes, the database replays the WAL on restart to reconstruct any dirty pages that never got flushed. The rule is in the name: write the log ahead of the data.
A checkpoint is the periodic, batched job that flushes dirty pages from the pool down to the data files and marks that portion of the WAL as no longer needed. Checkpoints are spread out on purpose (PostgreSQL's checkpoint_completion_target) so the disk isn't slammed with a write storm all at once.
5. Eviction — What Happens When Memory Fills Up
The buffer pool is finite, so eventually it fills. To load a new page, the engine must evict an old one. Both PostgreSQL and InnoDB use a variant of LRU — Least Recently Used — meaning the page that hasn't been touched for the longest time is the first to go.
This is why your hottest data tends to stay resident: every time a page is read, it's marked as recently used, so popular pages survive while one-off scans age out. It's also why a giant SELECT * over a cold table can briefly hurt — it can flood the pool with pages you'll never reuse, evicting genuinely hot ones (InnoDB's "midpoint insertion" LRU is specifically designed to resist exactly this).
Back to the kitchen: the chef keeps salt and oil on the counter all shift because they're touched constantly, but that exotic spice used once at lunch gets cleared off to make room. LRU is just "clear off whatever you haven't reached for in the longest time".
6. The Memory Knobs That Matter
Beginners often think every memory setting is "the cache". It isn't. There are three knobs worth knowing, and confusing them is the most common tuning mistake:
shared_buffers(PostgreSQL) /innodb_buffer_pool_size(MySQL) — the actual cache. One shared pool of data pages for the whole server. This is what your hit ratio depends on.work_mem— memory for one sort or hash operation, allocated per operation, per connection. When a sort doesn't fit inwork_memit spills to disk (slow). The trap: it is not the cache and it is not per-server — 100 connections each doing 2 sorts at 256 MB can demand 50 GB.effective_cache_size— a hint, not an allocation. It allocates zero memory. It just tells the query planner roughly how much cache (buffer pool + OS file cache) probably exists, which nudges it toward index scans. Set it high (~75% of RAM) without fear.
shared_buffers reserves RAM for caching data, work_mem reserves RAM per query operation for sorting, and effective_cache_size reserves nothing — it only informs the planner.Your Turn: size the buffer pool
No SQL to run here — just reason about the numbers. Fill in the two blanks using the 25% / 75% conventions for a 64 GB box.
🎯 Your Turn: sizing a 64GB server
Work out shared_buffers and effective_cache_size.
-- 🎯 YOUR TURN — fill in the blanks to SIZE a buffer pool.
-- Scenario: a dedicated PostgreSQL box with 64 GB of RAM.
-- The convention: shared_buffers ≈ 25% of RAM, and
-- effective_cache_size ≈ 75% of RAM (a HINT, not an allocation).
-- shared_buffers = ___ GB -- 👉 25% of 64 GB
-- effective_cache_size = ___ GB -- 👉 75% of 64 GB
-- ✅ Expected: shared_buffers = 16GB, effective_cache_size = 48GB.
-- shared_buffers actually reserves RAM; effective_cache_size only
-- TELLS the pla
...Common Mistakes (and the fix)
- Buffer pool too small: a low hit ratio (sliding under ~95%) and constant disk reads mean your working set doesn't fit. Increase
shared_buffers/innodb_buffer_pool_size— or add RAM. - Buffer pool too large (PostgreSQL): pushing
shared_bufferspast ~40% of RAM starves the OS file cache, which PostgreSQL also relies on. You can end up double-caching and slower. 25% is the safe default. - Ignoring warm-up: right after a restart the pool is empty, so every query is "first-run slow". Don't benchmark a cold server — warm it (or use
pg_prewarm) before judging performance. - Confusing
shared_bufferswithwork_mem: raisingwork_memdoes nothing for the cache hit ratio, and setting it high globally can exhaust RAM across many connections. It's per-operation, not per-server. - Low hit ratio "fixed" by
effective_cache_size: it allocates no memory and can't change your hit ratio — it only influences planner choices. Don't expect it to cache anything.
📘 Quick Reference
| Term / Setting | What it is |
|---|---|
| Page | Fixed chunk read/written as a unit (8KB PG, 16KB InnoDB) |
| Buffer pool | RAM cache of data pages (shared_buffers / innodb_buffer_pool_size) |
| Cache hit | Page found in memory — fast, no disk |
| Hit ratio | hits / (hits + reads); aim for 99%+ |
| Dirty page | Modified in memory, not yet written to disk |
| WAL / redo log | Sequential log written before data, for crash recovery |
| Checkpoint | Batched flush of dirty pages to the data files |
| LRU eviction | Drops the least-recently-used page to free space |
| work_mem | Per-operation sort/hash memory (NOT the cache) |
| effective_cache_size | Planner hint about total cache (allocates nothing) |
Frequently Asked Questions
Q: Why is the first run of a query slow but the second is fast?
The first run reads pages from disk (a cache miss) and copies them into the buffer pool. The second run finds those same pages already in RAM (cache hits), so it skips the slow disk I/O entirely — often 50–100x faster.
Q: Should I just make the buffer pool as big as possible?
No. In PostgreSQL, going past ~40% of RAM starves the OS file cache it also depends on, so ~25% is the sweet spot. InnoDB caches more itself, so 70–80% is normal there. Either way, leave room for the OS, connections, and per-query work_mem.
Q: Is effective_cache_size the same as shared_buffers?
No — and this trips up almost everyone. shared_buffers actually reserves RAM to cache pages. effective_cache_size reserves nothing; it's a number that tells the planner how much cache likely exists so it favours index scans. Setting it high is free.
Q: My hit ratio is great but writes feel slow. Why?
The buffer pool speeds up reads; writes are bounded by the WAL and checkpoints. If checkpoints bunch up you get write spikes — spread them out (raise checkpoint_completion_target, tune max_wal_size) so dirty pages flush gradually rather than all at once.
Mini-Challenge: Find the Cold Tables
Put it all together — a brief, a blank canvas, and the expected shape of the result in the comments. Write it, then run it against a real Postgres instance to confirm.
🎯 Mini-Challenge
Per-table hit ratio, worst-first, from pg_statio_user_tables.
-- 🎯 MINI-CHALLENGE
-- Using ONLY what this lesson covered (pg_statio counters + the
-- hit-ratio formula), write ONE query that reports, per table:
-- 1. the table name (relname)
-- 2. heap_blks_hit and heap_blks_read
-- 3. a hit_ratio_pct column, rounded to 2 decimals
-- 4. ordered so the WORST hit ratio is at the top (find cold tables)
--
-- ✅ Expected: one row per user table, e.g.
-- orders | 4900000 | 100000 | 98.00
-- products | 500000 | 250 | 99.95
-- ...with the
...🎉 Lesson Complete
- ✅ Databases cache data in fixed-size pages inside the buffer pool
- ✅ The cache hit ratio is
hits / (hits + reads)— aim for 99%+ - ✅ "First slow, second fast" = a disk miss warming the pool for the next hit
- ✅ Dirty pages are protected by the WAL and flushed at checkpoints
- ✅ LRU eviction drops the least-recently-used page when memory fills
- ✅
shared_buffersis the cache;work_memis per-sort;effective_cache_sizeis only a hint - ✅ Next: Massive Data Handling — bulk inserts and ETL at scale
Sign up for free to track which lessons you've completed and get learning reminders.