Lesson 46 • Expert
Performance Profiling
Find and fix performance bottlenecks using JFR, VisualVM, JMH benchmarks, and flame graphs.
Before You Start
You should understand Memory Management & GC, JVM Internals, and Thread Pools. Familiarity with Java collections performance characteristics also helps.
What You'll Learn
- ✅ JVM profiling tools: JFR, VisualVM, async-profiler
- ✅ JMH microbenchmarking framework
- ✅ Reading flame graphs and hot method analysis
- ✅ Memory profiling and leak detection
- ✅ GC tuning based on profiling data
- ✅ Common Java performance anti-patterns
1️⃣ The Performance Mindset
Analogy: Performance profiling is like being a detective at a crime scene. You don't guess who did it — you gather evidence (metrics), analyze clues (flame graphs), and follow the trail to the culprit (the hotspot method). The golden rule: "Measure, don't guess."
The 90/10 rule: 90% of execution time is typically spent in 10% of the code. Your job is to find that 10% and optimize it. Everything else is noise.
| Tool | Type | Overhead | Best For |
|---|---|---|---|
| JFR | Built-in recorder | <1% | Production profiling (always-on) |
| VisualVM | GUI profiler | 5-15% | Development debugging |
| async-profiler | Sampling profiler | ~2% | CPU + allocation hotspots |
| JMH | Microbenchmark | N/A | Comparing implementations |
| jcmd | CLI diagnostic | Minimal | Thread dumps, heap info |
Try It: JMH-Style Benchmarking
// 💡 Try modifying this code and see what happens!
// Microbenchmarking — measure before optimizing
console.log("=== JMH-Style Benchmarking ===\n");
// 1. Benchmark runner
function benchmark(name, fn, iterations = 100000) {
// Warmup (like JMH @Warmup)
for (let i = 0; i < 1000; i++) fn();
let start = performance.now();
for (let i = 0; i < iterations; i++) fn();
let elapsed = (performance.now() - start).toFixed(2);
let opsPerMs = (iterations / parseFloat(elapsed)).toFixed(0);
c
...2️⃣ Profiling Workflow
The optimization workflow is a disciplined loop: Measure → Profile → Identify → Fix → Verify. Never skip to "Fix" — you'll waste time optimizing the wrong thing. Developers are notoriously bad at guessing where the bottleneck is.
Flame graphs are your best friend: the wider a bar, the more CPU time that method consumes. Look for wide, flat bars — those are your hotspots. Narrow, deep stacks are usually fine.
Try It: Anti-Patterns & Profiling Commands
// 💡 Try modifying this code and see what happens!
// Performance anti-patterns and profiling commands
console.log("=== Anti-Patterns & Profiling ===\n");
// 1. Common anti-patterns with speed improvements
console.log("1. ANTI-PATTERNS → FIXES:");
let patterns = [
["String += in loop", "StringBuilder.append()", "50-100×"],
["Autoboxing in hot loop", "Use primitives (int, long)", "5-10×"],
["synchronized everywhere", "ConcurrentHashMap, Atomic*", "3-5×"],
["Object alloc in hot path", "O
...Try It: Optimization Loop Simulation
// 💡 Try modifying this code and see what happens!
// Simulated optimization cycle
console.log("=== The Optimization Loop ===\n");
// Simulate a performance issue and fix it
console.log("1. SCENARIO: Slow user search endpoint");
console.log(" Reported: /api/users/search takes 3.2 seconds\n");
// Step 1: Measure
console.log("STEP 1 — MEASURE:");
console.log(" p50 latency: 1200ms");
console.log(" p95 latency: 3200ms");
console.log(" p99 latency: 5100ms");
console.log(" Throughput: 45 req/
...Common Beginner Mistakes
- ❌ Optimizing without profiling — "I think this method is slow" is not evidence. Profile first, then optimize the actual hotspot
- ❌ Microbenchmarking with System.nanoTime() — the JIT compiler, GC, and warmup invalidate naive timing. Use JMH which handles all these correctly
- ❌ Optimizing cold paths — a method called 3 times during startup doesn't matter. Focus on hot paths (called millions of times)
- ❌ Heap dumps on production with no disk space — a heap dump is as large as your heap. Ensure enough disk space before dumping
- ❌ Premature optimization — Donald Knuth: "Premature optimization is the root of all evil." Make it work, make it right, THEN make it fast
Pro Tips
- 💡 Always-on JFR — run JFR in production with default settings. Less than 1% overhead and invaluable when issues arise
- 💡 Flame graphs — the wider a bar, the more CPU time that method consumes. Look for wide, flat bars — those are your hotspots
- 💡 Allocation profiling — high GC pressure often comes from excessive object allocation. Use JFR allocation events to find the source
- 💡 Use -XX:+HeapDumpOnOutOfMemoryError — automatically captures a heap dump when OOM occurs. Set
-XX:HeapDumpPath=/path/
📋 Quick Reference
| Task | Command / API | Output |
|---|---|---|
| CPU profile | -XX:StartFlightRecording | .jfr file |
| Heap dump | jcmd PID GC.heap_dump | .hprof file |
| Thread dump | jcmd PID Thread.print | Thread states |
| Benchmark | @Benchmark (JMH) | ops/sec, avg time |
| Flame graph | async-profiler -f fg.html | Interactive SVG |
| Auto heap dump | -XX:+HeapDumpOnOutOfMemoryError | .hprof on OOM |
🎉 Lesson Complete!
You can now profile and optimize Java applications like a pro! You understand JFR, JMH, flame graphs, and the optimization workflow. Next: Microservices — building distributed systems with Spring Boot.
Sign up for free to track which lessons you've completed and get learning reminders.