Lesson 27 • Advanced
Atomic Operations & Low-Level Synchronization
By the end of this lesson you'll be able to build a thread-safe counter with no mutex, update shared values lock-free with compare-and-swap, write a spinlock from an atomic_flag, and choose the right memory ordering for the job — the toolkit behind high-performance concurrent C++.
What You'll Learn
- Why a shared int++ across threads is a data race — and how atomic<int> fixes it
- Use fetch_add, load, store, and exchange for indivisible updates
- Apply compare_exchange_weak/strong (CAS) for lock-free read-modify-write
- Build a spinlock from std::atomic_flag with test_and_set / clear
- Tell relaxed, acquire/release, and seq_cst ordering apart
- Know when to pick atomics over a mutex (and when not to)
💡 Real-World Analogy
A normal counter++ is three steps: read the number, add one, write it back. Picture two people sharing one whiteboard tally. Both read "5", both write "6" — and one count vanished. That's a data race. An atomic operation is like a turnstile counter: each click is one sealed, indivisible action that the hardware guarantees can't overlap another. No locking, no waiting — just a count that's always exactly right. A mutex, by contrast, is the deli ticket queue: safe, but you wait your turn. Atomics skip the queue for the small jobs.
1. Atomic Counters vs Data Races
An atomic<int> guarantees every read and write is indivisible — no other thread can ever catch it half-finished. A plain int shared across threads has no such promise: counter++ compiles to read-add-write, and two threads can interleave those steps and lose updates. The worked example below races eight threads at both kinds of counter so you can see the difference. Read every comment, then run it.
Worked example: atomic counter across 8 threads
See the atomic counter stay exact while the plain int loses updates.
#include <iostream>
#include <atomic> // std::atomic, std::memory_order
#include <thread> // std::thread
#include <vector>
using namespace std;
// An atomic<int> can be read/modified by many threads at once
// WITHOUT a data race. Each operation is indivisible: no thread can
// ever see it "half done".
atomic<int> safeCounter{0}; // {0} sets the starting value to 0
int plainCounter = 0; // a normal int — NOT safe to share
// Each thread runs this. It bumps both counters n tim
...Your turn. The program below counts hits across four threads — fill in the two blanks marked ___ to make the counter atomic and read its final value.
🎯 Your turn: atomic hit counter
Fill in the ___ blanks, then check your output against the expected line.
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;
// 🎯 YOUR TURN — replace each ___ then press "Try it Yourself".
atomic<int> hits{0}; // a shared, thread-safe counter
void countHits(int n) {
for (int i = 0; i < n; i++) {
// 1) Atomically add 1 to 'hits'
___; // 👉 hits.fetch_add(1); (or hits++;)
}
}
int main() {
vector<thread> pool;
for (int t = 0; t < 4; t++)
pool.emplace_back(countHits, 250
...2. load, store, exchange & fetch_add
You don't read an atomic with plain = — you call methods so the intent (and the ordering) is explicit. load() reads, store(x) writes, and exchange(x) writes a new value while handing you the old one, all atomically. fetch_add(n) and fetch_sub(n) add or subtract and return the value before the change. These five cover almost everything you'll do with a single atomic.
Worked example: the core atomic methods
Watch load, store, exchange, and fetch_add in action.
#include <iostream>
#include <atomic>
using namespace std;
int main() {
atomic<int> val{10};
// load() reads the current value atomically.
cout << "load(): " << val.load() << "\n"; // 10
// store() writes a new value atomically.
val.store(42);
cout << "after store: " << val.load() << "\n"; // 42
// exchange() writes a NEW value and returns the OLD one,
// both in a single atomic step.
int previous = val.exchange(99);
cout << "exchange: o
...3. Compare-and-Swap (CAS) for Lock-Free Updates
fetch_add only handles simple arithmetic. For anything else — "double it", "set it only if it's still what I last saw" — you need compare-and-swap. compare_exchange_strong(expected, desired) swaps in desired only if the value still equals expected; otherwise it loads the real current value back into expected and returns false. That refresh-on-failure is why CAS lives in a loop: read, compute, try to swap, and retry if someone beat you to it. Use compare_exchange_weak inside that loop (it can fail spuriously but is faster); use strong when you aren't looping.
Worked example: compare_exchange and a CAS loop
Lock-free read-modify-write with compare_exchange_strong/weak.
#include <iostream>
#include <atomic>
using namespace std;
int main() {
// compare_exchange does: "IF the value equals 'expected',
// replace it with 'desired' and return true. OTHERWISE write the
// ACTUAL current value back into 'expected' and return false."
// This is the building block of every lock-free algorithm.
atomic<int> val{100};
int expected = 100;
bool ok = val.compare_exchange_strong(expected, 200);
cout << "CAS 100->200: " << (ok ? "ok" : "fail")
...Now you try. Fill in the three blanks to swap in a value with exchange, then bump it once with a CAS.
🎯 Your turn: exchange then compare-and-swap
Fill in the ___ blanks for exchange() and compare_exchange_strong().
#include <iostream>
#include <atomic>
using namespace std;
int main() {
// 🎯 YOUR TURN — replace each ___ then press "Try it Yourself".
atomic<int> level{1};
// 1) Swap in the value 5 and capture what was there before.
int old = ___; // 👉 level.exchange(5)
cout << "old level was " << old << "\n"; // old level was 1
// 2) Try to bump 5 -> 6 with CAS. Fill the expected value (5)
// and the desired value (6).
int expected = ___; // 👉 5
...4. Spinlocks & Memory Ordering
A std::atomic_flag is the simplest atomic — just set or clear, and the only type guaranteed lock-free on every platform. With its test_and_set / clear pair you can build a spinlock: a lock that busy-loops instead of sleeping, ideal for critical sections so short that sleeping would cost more than spinning.
Every atomic operation also takes a memory ordering that controls how it's ordered against other memory accesses across threads. The three you'll meet: memory_order_relaxed (atomic, but no cross-thread ordering — fine for a lone counter), acquire/release (a paired handshake: a release store publishes everything written before it to whoever does an acquire load — this is what the spinlock uses), and seq_cst (the default: one global order, easiest to reason about, slightly slower). Default to seq_cst and only relax when profiling demands it.
Worked example: atomic_flag spinlock
Build a spinlock and see acquire/release ordering protect shared state.
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;
// A spinlock is the simplest lock: "spin" (loop) until you grab it.
// atomic_flag is the only type guaranteed lock-free everywhere.
class SpinLock {
atomic_flag flag = ATOMIC_FLAG_INIT; // starts "clear" (unlocked)
public:
void lock() {
// test_and_set returns the OLD value and sets the flag.
// While it returns true, someone else holds the lock — keep spinning.
//
...Pro Tips
- 💡 Default to
seq_cst: the standard ordering is the safest. Only drop to acquire/release or relaxed when a profiler shows a real bottleneck. - 💡 Atomics are per-operation: they make one access indivisible. To update two values together as a unit, you still need a mutex.
- 💡 Check
is_lock_free(): if it returnsfalse, the library is using a hidden mutex behind your atomic — you lost the speed win. - 💡 Prefer
compare_exchange_weakin loops: it's faster and you're retrying anyway; usestrongonly when you're not looping.
Common Errors (and the fix)
- Non-atomic shared counter (the data race): a plain
int counter;incremented from several threads loses updates and is undefined behaviour. Make itatomic<int>and usefetch_add(1)(orcounter++). - Misusing
memory_order_relaxed: relaxed makes the op atomic but gives no cross-thread ordering, so using it to publish other data (setdata, then flip a relaxedreadyflag) lets the reader seereadybeforedata. Usereleaseon the writer andacquireon the reader. - The ABA problem: CAS only checks the value equals
expected— if it went A → B → A in between, the CAS still succeeds, which can corrupt lock-free structures that reuse memory. Guard with a version/tag counter that only ever increments. - Compound atomicity assumed:
atomic<int> a, b; if (a.load() > b.load())reads two atomics in two steps — they can change between them. Wrap the whole check in a mutex if it must be consistent. - Reading an atomic with
=expecting a copy:atomic<int> b = a;won't compile — atomics aren't copyable. Useatomic<int> b{a.load()};.
📋 Quick Reference — Memory Orders
| Ordering | Guarantee | Use it for |
|---|---|---|
| relaxed | Atomic only; no cross-thread ordering | Standalone counters, statistics |
| acquire | No later read/write moves before this load | The reader/consumer side |
| release | No earlier read/write moves after this store | The writer/producer side |
| acq_rel | Acquire + release on one read-modify-write | CAS in a lock-free loop |
| seq_cst | Single global order across all threads (default) | Everything, until proven a bottleneck |
acquire on a load pairs with release on the matching store — that pairing is what safely publishes data from one thread to another. Mismatch them and the compiler/CPU is free to reorder your code.
Frequently Asked Questions
Q: When should I use std::atomic instead of a mutex?
Reach for atomics when you are protecting a single, simple value such as a counter, a flag, or a pointer that swaps as a unit. They are lock-free and far faster than a mutex for that case. The moment you need to update two or more values together as one consistent step, use a mutex — atomics only make each individual operation indivisible, not a group of them.
Q: What is the difference between compare_exchange_weak and compare_exchange_strong?
Both do the compare-and-swap: replace the value only if it still equals what you expected. compare_exchange_strong fails only when the values genuinely differ. compare_exchange_weak may also fail 'spuriously' (for no real reason) on some hardware, so you use it inside a retry loop where you are looping anyway. Use weak in a loop for best performance, and strong when you are not looping.
Q: What is the ABA problem?
A CAS only checks whether the value still equals what you expected — not whether it changed and changed back. If a value goes A -> B -> A between your read and your CAS, the CAS succeeds as if nothing happened, even though state you cared about (like a freed and reused node) is different. The classic fix is to attach a version counter that always increments, so A-with-version-1 never equals A-with-version-2.
Q: What does memory_order_relaxed actually relax?
Relaxed guarantees the operation itself is atomic, but gives NO ordering guarantee relative to other variables across threads. It is correct for a standalone counter where you only care about the final count. It is wrong when one atomic is meant to 'publish' other data — there you need release on the writer and acquire on the reader so the data is visible.
Q: Which memory ordering should I use by default?
Use the default, memory_order_seq_cst, until profiling proves it is a bottleneck. It gives a single total order that is the easiest to reason about and the hardest to get wrong. Drop to acquire/release only for proven hot paths, and to relaxed only for independent counters or statistics where ordering genuinely does not matter.
Mini-Challenge: a Shared Stop Flag
No blanks this time — just a brief and an outline. Use an atomic<bool> to tell a worker thread when to stop and an atomic<int> to count how far it got. This is the everyday pattern for cleanly shutting a thread down. Build it, run it, and check the shape of your output against the example.
🎯 Mini-Challenge: stop flag + tick counter
Declare the atomics yourself, run a worker, and stop it cleanly.
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
using namespace std;
// 🎯 MINI-CHALLENGE: a shared "stop" flag
// 1. Declare an atomic<bool> called "stop" initialised to false.
// 2. Declare an atomic<int> called "ticks" initialised to 0.
// 3. Start ONE worker thread that loops: while (!stop) ticks++;
// (load stop atomically each time; fetch_add or ++ on ticks).
// 4. In main, sleep briefly, set stop = true, join the worker.
// 5. Print: "Worker ran <ticks> times
...🎉 Lesson Complete
- ✅ A shared plain
intincremented by many threads is a data race;atomic<int>makes each update indivisible - ✅
load,store,exchange, andfetch_add/fetch_subcover single-atomic work - ✅
compare_exchange_weak/strong(CAS) power lock-free read-modify-write loops - ✅
std::atomic_flagwithtest_and_set/clearbuilds a spinlock - ✅ Memory order:
relaxed(no ordering),acquire/release(paired handshake),seq_cst(the safe default) - ✅ Atomics are per-operation; reach for a mutex when several values must change together, and watch for the ABA problem
- ✅ Next lesson: Memory Allocation Internals — how new/delete work under the hood and writing custom allocators
Sign up for free to track which lessons you've completed and get learning reminders.