Skip to main content
    Back

    Lesson 22 • Advanced Track

    Replication: Sync, Async, Failover & Read Scaling

    By the end of this lesson you'll be able to reason about a production database the way an on-call engineer does: choose synchronous vs asynchronous replication for a given durability requirement, route reads to replicas without serving stale data, and survive a server failure with a clean, fenced failover.

    What You'll Learn

    • How a primary and its replicas divide reads and writes
    • Synchronous vs asynchronous: the durability ↔ latency trade-off
    • Physical/streaming vs logical replication, and when to pick each
    • Scale reads by routing SELECTs to replicas
    • Avoid stale reads & the read-your-writes trap from replication lag
    • Promote a replica on failure without causing split-brain

    The Picture: One Primary, Many Replicas

    Replication means keeping live copies of your database on more than one server. Exactly one node — the primary (a.k.a. leader/master) — accepts writes. The others — replicas (a.k.a. standbys/followers) — continuously copy the primary's changes and serve read-only queries. Picture a store: only the manager restocks shelves (writes), but every checkout counter can serve customers (reads).

    Result — who does what in a typical 3-node cluster:

    NodeRoleHandles writes?Handles reads?If primary dies
    db-1PrimaryYesYes (but keep it free for writes)— (it's the one that died)
    db-2ReplicaNo (read-only)YesCan be promoted to primary
    db-3ReplicaNo (read-only)YesCan be promoted to primary

    Writes flow in one direction: client → primary → replicas. That single rule is what keeps the copies consistent — and it's why the interesting questions are all about timing (how far behind are replicas?) and failure (what if the primary dies?).

    1. Primary vs Replica — Knowing Who You Are

    Before anything else, a server needs to know its own role, because a replica that thinks it's a primary is a disaster waiting to happen. PostgreSQL answers this with one function, and you'll use it constantly — especially right after a failover.

    🏪 Real-world analogy

    The primary is the shop's manager (the only one allowed to restock), and replicas are the cashiers (they serve customers but can't change stock). When the manager is out, you promote one cashier to acting manager — but first you make sure the old manager can't quietly keep restocking too.

    Am I primary or replica?

    The one function every node uses to know its role.

    Try it Yourself »
    SQL
    -- Ask any PostgreSQL server one question:
    -- "Am I a primary (read+write) or a replica (read-only)?"
    SELECT pg_is_in_recovery();
    
    -- FALSE = this server is the PRIMARY — it accepts INSERT/UPDATE/DELETE.
    -- TRUE  = this server is a REPLICA — it is read-only and is
    --         replaying changes streamed from the primary.
    
    -- Every replication setup starts here: exactly ONE primary owns
    -- the writes; every other node is a copy that follows it.

    Result — run on a replica:

    pg_is_in_recovery
    t

    t (true) means read-only replica; f (false) means it's the primary and accepts writes. This is the check your failover automation runs to confirm a promotion actually worked.

    2. Asynchronous Replication & Replication Lag

    Asynchronous replication is the common default: the primary commits a change the instant it has the change safely on its own disk, then streams that change to replicas afterwards. It never waits for them. That makes writes fast, but it introduces replication lag — the gap (usually milliseconds) between the primary having data and a replica having it.

    Async is like posting a letter: you drop it in the box and carry on; the recipient reads it whenever it arrives. The number you must watch is the lag, because it's also your worst-case data loss if the primary dies before a replica catches up.

    Async replication — measure the lag

    See connected replicas and how far behind they are.

    Try it Yourself »
    SQL
    -- ASYNCHRONOUS replication (the common default)
    -- The primary commits as soon as IT has the change on disk and
    -- streams the change to replicas AFTERWARDS. It never waits.
    --   → Fast writes, but a replica can be a few ms–seconds behind.
    
    -- 1) On the PRIMARY: see every connected replica and how far behind it is.
    SELECT client_addr,
           state,
           sync_state,                                  -- 'async' here
           pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
    FROM pg_stat_replicati
    ...

    Result — pg_stat_replication on the primary:

    client_addrstatesync_statelag_bytes
    10.0.0.12streamingasync4096
    10.0.0.13streamingasync8192

    3. Synchronous Replication — Trading Latency for Durability

    Synchronous replication flips the deal: the primary will not tell your application COMMIT succeeded until a replica has also stored the change. That guarantees zero data loss if the primary instantly explodes — but every write now pays a network round-trip, so writes are slower. Sync is a phone call (you wait for "got it"); async is a letter.

    This is the central trade-off of the whole lesson: durability vs latency. You don't have to choose once for the whole database — PostgreSQL's synchronous_commit setting can be changed per transaction, so money-movement can be synchronous while analytics logging stays fast and async.

    remote_apply is the strictest level: COMMIT waits until the replica has applied the change, so a read on that replica is guaranteed to see it. off is the loosest: fastest writes, but you may lose the last few committed transactions in a crash.

    Synchronous commit levels

    The durability ↔ latency dial, set per transaction.

    Try it Yourself »
    SQL
    -- SYNCHRONOUS replication
    -- The primary will NOT report COMMIT to your app until a replica has
    -- also stored the change. Zero data loss on failover — slower writes.
    
    -- postgresql.conf on the primary:
    --   synchronous_standby_names = 'replica1'
    --   synchronous_commit        = on
    
    -- 'synchronous_commit' is a DURABILITY vs LATENCY dial:
    --   off          -> fastest,  may lose the last few committed txns
    --   local        -> fast,     safe on PRIMARY's disk only
    --   remote_write -> medium,   
    ...

    Result — synchronous_commit — durability vs latency:

    LevelWrite speedGuarantee on a primary crash
    offFastestMay lose last few committed txns
    localFastSafe on primary's disk only
    remote_writeMediumReplica has received it
    on (default)SlowerReplica has flushed it to disk
    remote_applySlowestReplica has applied it (reads see it)

    Your Turn: sync or async?

    You're recording a bank transfer that must never be lost on failover, even if it makes the write slower. Fill in the blank with the right synchronous_commit level. The expected answer and why are in the comments.

    🎯 Your Turn: durability for money

    Pick the level that guarantees the replica applied the row.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — choose the right durability level, then run it.
    -- Requirement: this INSERT records a customer's bank transfer.
    -- Losing it on a failover is UNACCEPTABLE, even if writes get slower.
    
    BEGIN;
    SET LOCAL synchronous_commit = ___;   -- 👉 the level where the replica
                                          --    has APPLIED the row before COMMIT
    INSERT INTO transfers (amount) VALUES (12000.00);
    COMMIT;
    
    -- ✅ Expected: 'remote_apply'
    --    COMMIT now blocks until a replica confirms the
    ...

    4. Streaming/Physical vs Logical Replication

    There are two mechanisms for copying the data. Physical (streaming) replication ships the raw write-ahead log (WAL) — a byte-for-byte copy of the entire cluster. It's cheap, fast, and what you use for high availability and read replicas, but it's all-or-nothing and tied to the same major version.

    Logical replication instead decodes changes into rows and replicates table-by-table using a publish/subscribe model. That lets you copy only certain tables, replicate across different major versions, or funnel several databases into one warehouse — at the cost of more moving parts (DDL changes and sequences aren't carried over automatically).

    Physical vs logical replication

    Whole-cluster byte copy vs selective table sync.

    Try it Yourself »
    SQL
    -- TWO ways to copy the data, for two different jobs.
    
    -- PHYSICAL / STREAMING replication (block-for-block copy)
    --   Ships the raw WAL (write-ahead log) — an exact byte copy of the
    --   primary. Whole cluster only, same major version. This is what you
    --   use for high availability and read replicas. Cheap and fast.
    
    -- LOGICAL replication (publish / subscribe, table by table)
    --   Ships decoded row changes, so you choose WHICH tables to copy.
    -- On the PUBLISHER (source):
    CREATE PUBLICATION s
    ...

    Result — pick the mechanism for the job:

    AspectPhysical / streamingLogical (pub/sub)
    ScopeWhole clusterChosen tables
    Cross major versionNoYes
    Typical useHA + read replicasSelective sync / warehousing
    DDL & sequencesAlways in syncNot auto-replicated

    5. Read Scaling — Routing Reads to Replicas

    Because replicas hold a full copy of the data, you can spread your read traffic across them while the primary handles only writes. Your application keeps two connection pools and routes each query by type: SELECT goes to a replica, anything that writes goes to the primary. Add more replicas, serve more reads — that's horizontal read scaling.

    Route reads to replicas

    Two pools, and how to dodge stale reads.

    Try it Yourself »
    SQL
    -- READ SCALING: send writes to the primary, spread reads over replicas.
    -- Your app keeps TWO connection pools and routes each query by type.
    
    --   write_pool -> primary_host        (INSERT / UPDATE / DELETE)
    --   read_pool  -> replica_loadbalancer (SELECT)
    
    -- Pseudocode router:
    --   if (query is SELECT)  use read_pool;    -- scales horizontally
    --   else                  use write_pool;   -- one writer, always correct
    
    -- ⚠️ Async replicas LAG. A SELECT from a replica may not yet see a row
    --
    ...

    Your Turn: route the read & spot the risk

    Send a heavy dashboard query to the right pool, then explain the lag risk in a comment. Both blanks have their expected answers in the code.

    🎯 Your Turn: read_pool vs write_pool

    Pick the pool, then explain when a replica read is risky.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — route this report query to a read replica.
    -- Goal: count yesterday's orders for a dashboard. Stale-by-seconds is fine.
    
    -- 1) Pick the pool: a heavy read-only SELECT should NOT hit the primary.
    USE ___;   -- 👉 write_pool or read_pool ?
    
    SELECT COUNT(*) AS orders_yesterday
    FROM orders
    WHERE order_date = CURRENT_DATE - 1;
    
    -- ✅ Expected: read_pool
    --    Reads scale across replicas, sparing the primary for writes.
    -- 👉 In one sentence in a -- comment, say WHY this is safe here 
    ...

    6. Failover & Promotion — Surviving a Dead Primary

    When the primary fails, the show must go on: a replica is promoted to become the new primary that accepts writes. You can do this manually (run pg_promote() on a replica) or automatically with a cluster manager like Patroni, which detects the failure and promotes the healthiest replica within seconds — your app just reconnects through a pooler with no code change.

    Failover & promotion

    Promote a replica, confirm the role, avoid split-brain.

    Try it Yourself »
    SQL
    -- FAILOVER: the primary died — promote a replica to take over writes.
    
    -- Manual promotion, run ON the chosen replica:
    SELECT pg_promote(wait := true, timeout := 60);
    -- Afterwards confirm it is now the primary:
    SELECT pg_is_in_recovery();   -- must return FALSE (writes allowed)
    
    -- AUTOMATIC failover (e.g. Patroni + etcd/Consul):
    --   1. The cluster manager misses the primary's heartbeat.
    --   2. It picks the most up-to-date replica as the new leader.
    --   3. It promotes that replica and point
    ...

    Deep Dive: Why Not Just Have Multiple Primaries?

    It's tempting to let every node accept writes (multi-primary, a.k.a. multi-master) for even more write capacity and no failover step. The catch is conflicts: if two primaries update the same row at the same time, which value wins? You're now signed up for conflict-resolution rules, possible lost updates, and far harder reasoning about consistency.

    For the vast majority of apps, a single primary plus read replicas is simpler, correct by construction, and scales reads — which is the bottleneck most systems actually hit first. Reach for multi-primary only with a clear need (e.g. multi-region active-active writes) and eyes open to the trade-offs.

    Common Errors (and the fix)

    • Reading stale data from a lagged replica: a user saves, reloads, and the change is "missing" because they hit a behind replica. Fix: read-your-writes — route that user to the primary briefly, or make the write synchronous (remote_apply).
    • Async data loss on failover: the primary dies before replicas catch up, and the last few async commits are gone. Fix: use synchronous replication for the data that must never be lost; accept the latency cost for that data only.
    • Split-brain (two primaries): the old primary returns and keeps writing after a replica was promoted, so data diverges. Fix: fence the old primary before/at promotion and use a quorum-based manager (Patroni + etcd) so only one leader can exist.
    • No monitoring of replication lag: you discover the lag only during an outage. Fix: alert on NOW() - pg_last_xact_replay_timestamp() and on pg_stat_replication long before you ever need a failover.
    • Replication slot fills the primary's disk: a replica goes offline but its slot makes the primary keep every WAL file, eventually filling the disk. Fix: monitor pg_replication_slots and set a max_slot_wal_keep_size cap.

    📘 Quick Reference

    Command / settingPurpose
    pg_is_in_recovery()TRUE = replica (read-only), FALSE = primary
    pg_stat_replicationOn primary: each replica's state & lag_bytes
    pg_last_xact_replay_timestamp()On replica: measure replication lag
    synchronous_commit = onWait for replica to flush (durable, slower)
    synchronous_commit = remote_applyWait until replica applied (reads see it)
    synchronous_commit = local / offDon't wait (fast, async, may lose data)
    CREATE PUBLICATION … FOR TABLELogical replication: publish chosen tables
    CREATE SUBSCRIPTION … PUBLICATIONLogical replication: subscribe on destination
    pg_promote()Promote this replica to primary (failover)

    Frequently Asked Questions

    Q: Sync or async — which should I default to?

    Default to asynchronous for its speed, then make specific high-value writes synchronous with a per-transaction SET LOCAL synchronous_commit = 'remote_apply'. Making everything synchronous is usually slower than you need.

    Q: Can I write to a replica?

    No. Replicas are read-only — an attempted write errors with "cannot execute … in a read-only transaction". Writes go to the primary; that one-way flow is what keeps the copies consistent.

    Q: How is replication different from a backup?

    A backup is a point-in-time snapshot for recovering from data loss or corruption. Replication is a continuously-updated live copy for availability and read scaling. You need both — replication won't save you from a bad DELETE, because it faithfully copies the delete to every replica.

    Q: Does adding read replicas speed up writes?

    No — there's still a single primary doing all writes, and with synchronous replication more replicas can make writes slightly slower. Replicas scale reads; scale writes with bigger hardware, partitioning/sharding, or (carefully) multi-primary.

    Mini-Challenge: Design a Topology

    Put it all together — a brief and a blank canvas. Design the replication topology for a read-heavy news site that must survive one server dying. The expected shape of a good answer is in the comments.

    🎯 Mini-Challenge

    Design primaries, replicas, sync choice, and failover for a read-heavy app.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE: design a topology for a read-heavy app
    -- Scenario: a news site. ~95% of traffic is reads (article pages),
    -- ~5% is writes (editors publishing). It must survive one server dying.
    --
    -- Sketch your answer as -- comments (no SQL to run). Decide:
    --   1. How many PRIMARIES? (who owns the writes?)
    --   2. How many REPLICAS, and what do they serve?
    --   3. Async or sync replication for the editors' writes — and why?
    --   4. How do readers avoid hitting a lagged replica right af
    ...

    🎉 Lesson Complete

    • ✅ One primary owns writes; replicas copy it and serve reads
    • Async is fast but lags; sync trades latency for zero data loss
    • Physical/streaming copies the whole cluster; logical copies chosen tables
    • ✅ Route SELECTs to replicas to scale reads — but mind read-your-writes
    • ✅ On failure, promote a caught-up replica and fence the old primary to avoid split-brain
    • Next: Advanced Views — virtual tables that simplify complex queries

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service