Query Optimization: Execution Plans & Cost Estimation
Learn to read EXPLAIN plans, understand the cost model, and rewrite slow queries for maximum performance.
๐ฏ What You'll Learn
- How to read EXPLAIN and EXPLAIN ANALYZE output
- Understanding cost estimation and row estimates
- Query rewrite patterns that unlock index usage
- How join methods (Hash, Merge, Nested Loop) are chosen
- Keeping table statistics fresh for accurate plans
๐ฌ EXPLAIN โ Your Query Microscope
Think of EXPLAIN as an X-ray for your query. Before the database runs your SQL, the query planner builds an execution plan โ a step-by-step recipe. EXPLAIN shows you that recipe, and EXPLAIN ANALYZE actually cooks the dish so you can see real timing.
EXPLAIN & EXPLAIN ANALYZE
See how the optimizer plans and executes a query
-- EXPLAIN shows the query plan WITHOUT running
EXPLAIN
SELECT e.name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 70000
ORDER BY e.salary DESC;
-- EXPLAIN ANALYZE actually runs the query and shows real times
EXPLAIN ANALYZE
SELECT e.name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 70000
ORDER BY e.salary DESC;
-- Key things to look for in the output:
-- 1. Seq Scan vs
...๐ฐ Cost Model โ How the Optimizer Decides
The optimizer assigns a numeric cost to each possible plan and picks the cheapest one. Cost is measured in abstract units based on disk I/O, CPU, and memory. A sequential page read costs 1.0, while a random read costs 4.0 (because spinning disks seek slowly).
๐ก Pro Tip
If your database runs on SSDs, consider lowering random_page_cost to 1.1โ1.5 since SSDs have nearly the same speed for random and sequential reads. This makes the optimizer prefer index scans more often.
Cost Estimation
Compare costs before and after adding an index
-- Understanding cost estimation
-- cost=startup_cost..total_cost
-- rows=estimated row count
-- width=average row width in bytes
EXPLAIN
SELECT * FROM orders WHERE total > 500;
-- Might show: Seq Scan on orders
-- cost=0.00..1250.00 rows=5000 width=64
-- Filter: (total > 500)
-- After adding an index:
CREATE INDEX idx_orders_total ON orders (total);
EXPLAIN
SELECT * FROM orders WHERE total > 500;
-- Now shows: Index Scan using idx_orders_total
-- cost=0.42..285.50 rows=5000 width=64
--
...โ๏ธ Query Rewrite Patterns
The fastest query is the one the database doesn't have to work hard on. These rewrite patterns remove common performance killers:
โ ๏ธ Common Mistake โ Functions on Indexed Columns
Writing WHERE YEAR(date_col) = 2023 forces a full table scan because the index stores raw dates, not years. Always rewrite as a range: WHERE date_col >= '2023-01-01' AND date_col < '2024-01-01'.
Query Rewrite Patterns
Transform slow queries into fast ones
-- Common query rewrites that improve performance
-- โ SLOW: Function on indexed column kills index
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- โ
FAST: Rewrite as range (index-friendly)
SELECT * FROM employees
WHERE hire_date >= '2023-01-01'
AND hire_date < '2024-01-01';
-- โ SLOW: OR on different columns prevents index merge
SELECT * FROM products
WHERE category = 'Electronics' OR brand = 'Apple';
-- โ
FAST: UNION ALL lets each branch use its own index
SELECT * FROM products
...๐ Join Method Selection
When joining tables, the optimizer chooses between three strategies โ each excels in different scenarios. Understanding these helps you predict and influence the plan.
Join Methods Deep Dive
Hash Join, Merge Join, and Nested Loop explained
-- Join order and methods matter enormously
-- Hash Join: builds hash table from smaller table
-- Best when: one table is much smaller than the other
-- Memory: needs enough RAM for hash table
EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c -- small table (1,000 rows)
JOIN orders o ON c.id = o.customer_id -- large table (1M rows)
WHERE o.total > 100;
-- Merge Join: sorts both inputs, then merges
-- Best when: both inputs are already sorted (indexed)
-- Excellent for: large-
...๐ Statistics โ The Optimizer's Crystal Ball
The optimizer can only make good plans if it knows how many rows match each condition. Table statistics (histograms, distinct counts, correlations) provide this information. Stale stats lead to catastrophic plan choices.
Table Statistics
Keep statistics fresh for accurate query plans
-- The optimizer relies on table statistics
-- Stale statistics = bad query plans
-- PostgreSQL: update statistics manually
ANALYZE employees;
ANALYZE orders;
-- Check when stats were last updated
SELECT relname, last_analyze, last_autoanalyze, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC;
-- MySQL: update statistics
ANALYZE TABLE employees;
ANALYZE TABLE orders;
-- View column statistics (PostgreSQL)
SELECT attname, n_distinct, most_common_vals, c
...๐ Quick Reference
| Concept | Key Takeaway |
|---|---|
| EXPLAIN | Shows plan without executing |
| EXPLAIN ANALYZE | Runs query + shows actual times |
| Seq Scan | Full table scan โ often needs an index |
| Hash Join | Small table hashed, large probed |
| ANALYZE | Refreshes table statistics |
๐ Lesson Complete!
You can now read execution plans, estimate costs, and rewrite queries for peak performance. Next, explore advanced JOIN patterns!
Sign up for free to track which lessons you've completed and get learning reminders.