Courses/C++/Atomic Operations

Lesson 27 • Advanced

Atomic Operations & Low-Level Synchronization

By the end of this lesson you'll be able to build a thread-safe counter with no mutex, update shared values lock-free with compare-and-swap, write a spinlock from an atomic_flag, and choose the right memory ordering for the job — the toolkit behind high-performance concurrent C++.

What You'll Learn

Why a shared int++ across threads is a data race — and how atomic<int> fixes it
Use fetch_add, load, store, and exchange for indivisible updates
Apply compare_exchange_weak/strong (CAS) for lock-free read-modify-write
Build a spinlock from std::atomic_flag with test_and_set / clear
Tell relaxed, acquire/release, and seq_cst ordering apart
Know when to pick atomics over a mutex (and when not to)

Before you start: finish Mutexes & Locks first. You should know what a data race is (two threads touching the same variable, at least one writing) and how a mutex prevents it. Atomics are the lighter-weight tool for the simplest of those cases.

💡 Real-World Analogy

A normal counter++ is three steps: read the number, add one, write it back. Picture two people sharing one whiteboard tally. Both read "5", both write "6" — and one count vanished. That's a data race. An atomic operation is like a turnstile counter: each click is one sealed, indivisible action that the hardware guarantees can't overlap another. No locking, no waiting — just a count that's always exactly right. A mutex, by contrast, is the deli ticket queue: safe, but you wait your turn. Atomics skip the queue for the small jobs.

1. Atomic Counters vs Data Races

An atomic<int> guarantees every read and write is indivisible — no other thread can ever catch it half-finished. A plain int shared across threads has no such promise: counter++ compiles to read-add-write, and two threads can interleave those steps and lose updates. The worked example below races eight threads at both kinds of counter so you can see the difference. Read every comment, then run it.

Worked example: atomic counter across 8 threads

See the atomic counter stay exact while the plain int loses updates.

Try it Yourself »

C++

#include <iostream>
#include <atomic>     // std::atomic, std::memory_order
#include <thread>     // std::thread
#include <vector>
using namespace std;

// An atomic<int> can be read/modified by many threads at once
// WITHOUT a data race. Each operation is indivisible: no thread can
// ever see it "half done".
atomic<int> safeCounter{0};   // {0} sets the starting value to 0
int plainCounter = 0;         // a normal int — NOT safe to share

// Each thread runs this. It bumps both counters n tim
...

Your turn. The program below counts hits across four threads — fill in the two blanks marked ___ to make the counter atomic and read its final value.

🎯 Your turn: atomic hit counter

Fill in the ___ blanks, then check your output against the expected line.

Try it Yourself »

C++

#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;

// 🎯 YOUR TURN — replace each ___ then press "Try it Yourself".

atomic<int> hits{0};   // a shared, thread-safe counter

void countHits(int n) {
    for (int i = 0; i < n; i++) {
        // 1) Atomically add 1 to 'hits'
        ___;            // 👉 hits.fetch_add(1);  (or hits++;)
    }
}

int main() {
    vector<thread> pool;
    for (int t = 0; t < 4; t++)
        pool.emplace_back(countHits, 250
...

2. load, store, exchange & fetch_add

You don't read an atomic with plain = — you call methods so the intent (and the ordering) is explicit. load() reads, store(x) writes, and exchange(x) writes a new value while handing you the old one, all atomically. fetch_add(n) and fetch_sub(n) add or subtract and return the value before the change. These five cover almost everything you'll do with a single atomic.

Worked example: the core atomic methods

Watch load, store, exchange, and fetch_add in action.

Try it Yourself »

C++

#include <iostream>
#include <atomic>
using namespace std;

int main() {
    atomic<int> val{10};

    // load() reads the current value atomically.
    cout << "load():     " << val.load() << "\n";      // 10

    // store() writes a new value atomically.
    val.store(42);
    cout << "after store: " << val.load() << "\n";     // 42

    // exchange() writes a NEW value and returns the OLD one,
    // both in a single atomic step.
    int previous = val.exchange(99);
    cout << "exchange:   o
...

3. Compare-and-Swap (CAS) for Lock-Free Updates

fetch_add only handles simple arithmetic. For anything else — "double it", "set it only if it's still what I last saw" — you need compare-and-swap. compare_exchange_strong(expected, desired) swaps in desired only if the value still equals expected; otherwise it loads the real current value back into expected and returns false. That refresh-on-failure is why CAS lives in a loop: read, compute, try to swap, and retry if someone beat you to it. Use compare_exchange_weak inside that loop (it can fail spuriously but is faster); use strong when you aren't looping.

Worked example: compare_exchange and a CAS loop

Lock-free read-modify-write with compare_exchange_strong/weak.

Try it Yourself »

C++

#include <iostream>
#include <atomic>
using namespace std;

int main() {
    // compare_exchange does: "IF the value equals 'expected',
    // replace it with 'desired' and return true. OTHERWISE write the
    // ACTUAL current value back into 'expected' and return false."
    // This is the building block of every lock-free algorithm.
    atomic<int> val{100};

    int expected = 100;
    bool ok = val.compare_exchange_strong(expected, 200);
    cout << "CAS 100->200: " << (ok ? "ok" : "fail")

...

Now you try. Fill in the three blanks to swap in a value with exchange, then bump it once with a CAS.

🎯 Your turn: exchange then compare-and-swap

Fill in the ___ blanks for exchange() and compare_exchange_strong().

Try it Yourself »

C++

#include <iostream>
#include <atomic>
using namespace std;

int main() {
    // 🎯 YOUR TURN — replace each ___ then press "Try it Yourself".
    atomic<int> level{1};

    // 1) Swap in the value 5 and capture what was there before.
    int old = ___;          // 👉 level.exchange(5)
    cout << "old level was " << old << "\n";        // old level was 1

    // 2) Try to bump 5 -> 6 with CAS. Fill the expected value (5)
    //    and the desired value (6).
    int expected = ___;     // 👉 5
  
...

4. Spinlocks & Memory Ordering

A std::atomic_flag is the simplest atomic — just set or clear, and the only type guaranteed lock-free on every platform. With its test_and_set / clear pair you can build a spinlock: a lock that busy-loops instead of sleeping, ideal for critical sections so short that sleeping would cost more than spinning.

Every atomic operation also takes a memory ordering that controls how it's ordered against other memory accesses across threads. The three you'll meet: memory_order_relaxed (atomic, but no cross-thread ordering — fine for a lone counter), acquire/release (a paired handshake: a release store publishes everything written before it to whoever does an acquire load — this is what the spinlock uses), and seq_cst (the default: one global order, easiest to reason about, slightly slower). Default to seq_cst and only relax when profiling demands it.

Worked example: atomic_flag spinlock

Build a spinlock and see acquire/release ordering protect shared state.

Try it Yourself »

C++

#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;

// A spinlock is the simplest lock: "spin" (loop) until you grab it.
// atomic_flag is the only type guaranteed lock-free everywhere.
class SpinLock {
    atomic_flag flag = ATOMIC_FLAG_INIT;   // starts "clear" (unlocked)
public:
    void lock() {
        // test_and_set returns the OLD value and sets the flag.
        // While it returns true, someone else holds the lock — keep spinning.
        // 
...

Pro Tips

💡 Default to seq_cst: the standard ordering is the safest. Only drop to acquire/release or relaxed when a profiler shows a real bottleneck.
💡 Atomics are per-operation: they make one access indivisible. To update two values together as a unit, you still need a mutex.
💡 Check is_lock_free(): if it returns false, the library is using a hidden mutex behind your atomic — you lost the speed win.
💡 Prefer compare_exchange_weak in loops: it's faster and you're retrying anyway; use strong only when you're not looping.

Common Errors (and the fix)

Non-atomic shared counter (the data race): a plain int counter; incremented from several threads loses updates and is undefined behaviour. Make it atomic<int> and use fetch_add(1) (or counter++).
Misusing memory_order_relaxed: relaxed makes the op atomic but gives no cross-thread ordering, so using it to publish other data (set data, then flip a relaxed ready flag) lets the reader see ready before data. Use release on the writer and acquire on the reader.
The ABA problem: CAS only checks the value equals expected — if it went A → B → A in between, the CAS still succeeds, which can corrupt lock-free structures that reuse memory. Guard with a version/tag counter that only ever increments.
Compound atomicity assumed: atomic<int> a, b; if (a.load() > b.load()) reads two atomics in two steps — they can change between them. Wrap the whole check in a mutex if it must be consistent.
Reading an atomic with = expecting a copy: atomic<int> b = a; won't compile — atomics aren't copyable. Use atomic<int> b{a.load()};.

📋 Quick Reference — Memory Orders

Ordering	Guarantee	Use it for
relaxed	Atomic only; no cross-thread ordering	Standalone counters, statistics
acquire	No later read/write moves before this load	The reader/consumer side
release	No earlier read/write moves after this store	The writer/producer side
acq_rel	Acquire + release on one read-modify-write	CAS in a lock-free loop
seq_cst	Single global order across all threads (default)	Everything, until proven a bottleneck

acquire on a load pairs with release on the matching store — that pairing is what safely publishes data from one thread to another. Mismatch them and the compiler/CPU is free to reorder your code.

Frequently Asked Questions

Q: When should I use std::atomic instead of a mutex?

Reach for atomics when you are protecting a single, simple value such as a counter, a flag, or a pointer that swaps as a unit. They are lock-free and far faster than a mutex for that case. The moment you need to update two or more values together as one consistent step, use a mutex — atomics only make each individual operation indivisible, not a group of them.

Q: What is the difference between compare_exchange_weak and compare_exchange_strong?

Both do the compare-and-swap: replace the value only if it still equals what you expected. compare_exchange_strong fails only when the values genuinely differ. compare_exchange_weak may also fail 'spuriously' (for no real reason) on some hardware, so you use it inside a retry loop where you are looping anyway. Use weak in a loop for best performance, and strong when you are not looping.

Q: What is the ABA problem?

A CAS only checks whether the value still equals what you expected — not whether it changed and changed back. If a value goes A -> B -> A between your read and your CAS, the CAS succeeds as if nothing happened, even though state you cared about (like a freed and reused node) is different. The classic fix is to attach a version counter that always increments, so A-with-version-1 never equals A-with-version-2.

Q: What does memory_order_relaxed actually relax?

Relaxed guarantees the operation itself is atomic, but gives NO ordering guarantee relative to other variables across threads. It is correct for a standalone counter where you only care about the final count. It is wrong when one atomic is meant to 'publish' other data — there you need release on the writer and acquire on the reader so the data is visible.

Q: Which memory ordering should I use by default?

Use the default, memory_order_seq_cst, until profiling proves it is a bottleneck. It gives a single total order that is the easiest to reason about and the hardest to get wrong. Drop to acquire/release only for proven hot paths, and to relaxed only for independent counters or statistics where ordering genuinely does not matter.

Mini-Challenge: a Shared Stop Flag

No blanks this time — just a brief and an outline. Use an atomic<bool> to tell a worker thread when to stop and an atomic<int> to count how far it got. This is the everyday pattern for cleanly shutting a thread down. Build it, run it, and check the shape of your output against the example.

🎯 Mini-Challenge: stop flag + tick counter

Declare the atomics yourself, run a worker, and stop it cleanly.

Try it Yourself »

C++

#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;

// 🎯 MINI-CHALLENGE: a shared "stop" flag
// 1. Declare an atomic<bool> called "stop" initialised to false.
// 2. Declare an atomic<int> called "ticks" initialised to 0.
// 3. Start ONE worker thread that loops: while (!stop) ticks++;
//    (load stop atomically each time; fetch_add or ++ on ticks).
// 4. In main, sleep briefly, set stop = true, join the worker.
// 5. Print: "Worker ran <ticks> times
...

🎉 Lesson Complete

✅ A shared plain int incremented by many threads is a data race; atomic<int> makes each update indivisible
✅ load, store, exchange, and fetch_add/fetch_sub cover single-atomic work
✅ compare_exchange_weak/strong (CAS) power lock-free read-modify-write loops
✅ std::atomic_flag with test_and_set/clear builds a spinlock
✅ Memory order: relaxed (no ordering), acquire/release (paired handshake), seq_cst (the safe default)
✅ Atomics are per-operation; reach for a mutex when several values must change together, and watch for the ABA problem
✅ Next lesson: Memory Allocation Internals — how new/delete work under the hood and writing custom allocators

Atomic Operations & Low-Level Synchronization

What You'll Learn

💡 Real-World Analogy

1. Atomic Counters vs Data Races

Worked example: atomic counter across 8 threads

🎯 Your turn: atomic hit counter

2. load, store, exchange & fetch_add

Worked example: the core atomic methods

3. Compare-and-Swap (CAS) for Lock-Free Updates

Worked example: compare_exchange and a CAS loop

🎯 Your turn: exchange then compare-and-swap

4. Spinlocks & Memory Ordering

Worked example: atomic_flag spinlock

Pro Tips

Common Errors (and the fix)

📋 Quick Reference — Memory Orders

Frequently Asked Questions

Mini-Challenge: a Shared Stop Flag

🎯 Mini-Challenge: stop flag + tick counter

🎉 Lesson Complete

Cookie & Privacy Settings