What You'll Learn

    • Cache lines and memory access patterns
    • Branch prediction and branchless code
    • AoS vs SoA data layout
    • Hot/cold data splitting

    Writing High-Performance C++ Code

    Performance in C++ is dominated by three hardware realities: cache locality (are you reading sequential memory?), branch prediction (are your conditionals predictable?), and data layout (is your struct organized for how you access it?). Master these and your code runs 2-10× faster without algorithmic changes.

    Cache Lines — Sequential Access Wins

    CPUs don't load individual bytes — they load 64-byte cache lines. Sequential access uses every byte of each line. Strided access wastes most of it. The difference between L1 cache (1ns) and RAM (100ns) is 100×.

    Pro Tip: Use perf stat -e cache-misses,cache-references to measure your cache miss rate. Below 5% is good; above 20% means your data layout needs work.

    Cache Lines

    Compare stride-1 vs stride-16 memory access

    Try it Yourself »
    C++
    #include <iostream>
    #include <vector>
    #include <chrono>
    using namespace std;
    
    class Timer {
        string label;
        chrono::high_resolution_clock::time_point start;
    public:
        Timer(const string& l) : label(l), start(chrono::high_resolution_clock::now()) {}
        ~Timer() {
            auto us = chrono::duration_cast<chrono::microseconds>(
                chrono::high_resolution_clock::now() - start).count();
            cout << label << ": " << us << " µs" << endl;
        }
    };
    
    int main() {
        const int N = 100
    ...

    Branch Prediction — Predictable Code is Fast Code

    The CPU predicts which way an if branch will go before it evaluates the condition. When the prediction is right (~97% on sorted data), execution is nearly free. When it's wrong (~50% on random data), the CPU flushes 15-20 cycles of work. Branchless code avoids the penalty entirely.

    Common Mistake: Optimizing for branch prediction before profiling. Not all branches are hot. Use perf stat -e branch-misses to find the ones that actually matter.

    Branch Prediction

    See how data sorting affects conditional performance

    Try it Yourself »
    C++
    #include <iostream>
    #include <vector>
    #include <algorithm>
    #include <chrono>
    #include <random>
    using namespace std;
    
    class Timer {
        string label;
        chrono::high_resolution_clock::time_point start;
    public:
        Timer(const string& l) : label(l), start(chrono::high_resolution_clock::now()) {}
        ~Timer() {
            auto us = chrono::duration_cast<chrono::microseconds>(
                chrono::high_resolution_clock::now() - start).count();
            cout << label << ": " << us << " µs" << endl;
        }
    ...

    Data Layout — AoS vs SoA

    Array of Structures (AoS) groups all fields of one object together. Structure of Arrays (SoA) groups the same field across all objects. For batch processing (update all positions), SoA is faster because each cache line contains only the data you need.

    AoS vs SoA

    Compare data layouts for particle simulation

    Try it Yourself »
    C++
    #include <iostream>
    #include <vector>
    #include <chrono>
    using namespace std;
    
    class Timer {
        string label;
        chrono::high_resolution_clock::time_point start;
    public:
        Timer(const string& l) : label(l), start(chrono::high_resolution_clock::now()) {}
        ~Timer() {
            auto us = chrono::duration_cast<chrono::microseconds>(
                chrono::high_resolution_clock::now() - start).count();
            cout << label << ": " << us << " µs" << endl;
        }
    };
    
    // Array of Structures (AoS) — com
    ...

    Quick Reference

    TechniqueSpeedup
    Sequential access2-10× vs random
    Sorted branch data2-5× vs unsorted
    Branchless code1.5-3× for hot loops
    SoA layout1.5-4× for batch ops
    Hot/cold splitting1.5-2× cache efficiency

    Lesson Complete!

    You now understand cache lines, branch prediction, and data layout — the three pillars of high-performance C++.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service