Replication: Sync, Async, Failover & Clustering
Set up primary/replica replication, automatic failover, and read scaling for production databases.
๐ฏ What You'll Learn
- Asynchronous replication and monitoring replication lag
- Synchronous replication for zero data loss
- Logical replication for selective table sync
- Failover procedures and automatic failover with Patroni
- Read replicas for horizontal read scaling
โก Asynchronous Replication
Async replication is like sending a letter โ the primary mails the changes and keeps working. The replica processes them when they arrive. Fast for the primary, but the replica might be a few seconds behind.
Async Replication
Monitor lag, WAL generation, and replication slots
-- ASYNCHRONOUS REPLICATION
-- Primary commits immediately, replica catches up later
-- Fastest for writes, but replica may lag behind
-- Check replication status on primary:
SELECT pid, usename, client_addr,
state, sync_state,
sent_lsn, write_lsn, flush_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
-- Check lag on replica:
SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_lag;
-- If lag > acceptable threshold, i
...๐ Synchronous Replication
Sync replication is like a phone call โ the primary waits for the replica to confirm before committing. Zero data loss, but every write is slower. Use it for critical financial data; skip it for analytics logs.
๐ก Pro Tip โ Per-Transaction Control
You can set synchronous_commit per transaction. Critical payments use remote_apply; page view logging uses local. Same database, different durability guarantees.
Synchronous Replication
Zero-loss replication with per-transaction control
-- SYNCHRONOUS REPLICATION
-- Primary waits for replica to confirm before committing
-- Zero data loss, but slower writes
-- postgresql.conf on primary:
-- synchronous_standby_names = 'replica1'
-- synchronous_commit = on
-- Levels of synchronous commit:
-- | Level | Speed | Durability |
-- |--------------------|--------|----------------------|
-- | off | Fastest | May lose recent txns |
-- | local | Fast | Primary durable |
-- | remo
...๐ Logical Replication
Logical replication sends decoded changes (INSERT, UPDATE, DELETE) instead of raw WAL bytes. This lets you replicate specific tables, across PostgreSQL versions, or aggregate data from multiple sources.
Logical Replication
Publish/subscribe model for selective table sync
-- LOGICAL REPLICATION: replicate specific tables, not entire DB
-- Useful for: selective sync, cross-version, data integration
-- On the PUBLISHER (source):
CREATE PUBLICATION my_publication
FOR TABLE customers, orders, products;
-- Or publish ALL tables:
-- CREATE PUBLICATION all_tables FOR ALL TABLES;
-- On the SUBSCRIBER (destination):
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=primary_ip dbname=mydb user=repl_user password=pwd'
PUBLICATION my_publication;
-- Check subscription
...๐ Failover
When the primary fails, a replica must be promoted to take over. This can be manual (run a command) or automatic (Patroni detects failure and promotes within seconds).
โ ๏ธ Common Mistake โ Split Brain
If the old primary comes back online after a replica is promoted, you have two primaries accepting writes (split brain). Always "fence" the old primary โ shut it down or prevent it from accepting connections before promoting a replica.
Failover Procedures
Manual and automatic failover with Patroni
-- FAILOVER: promoting a replica to primary
-- Manual failover (PostgreSQL):
-- On the replica server:
-- pg_ctl promote -D /var/lib/postgresql/data
-- or:
SELECT pg_promote(wait := true, timeout := 60);
-- After promotion:
-- 1. Update application connection strings to point to new primary
-- 2. Verify the new primary is accepting writes:
SELECT pg_is_in_recovery();
-- FALSE = this is the primary (accepting writes)
-- TRUE = this is still a replica (read-only)
-- 3. Rebuild old primary as a
...๐ Read Replicas
Read replicas let you distribute SELECT queries across multiple servers while the primary handles writes. Like adding more checkout counters at a store โ each one serves customers, but only the manager can restock shelves.
Read Replica Scaling
Route reads to replicas and handle stale data
-- READ REPLICAS: scale reads horizontally
-- Architecture:
-- Writes โ Primary (single server)
-- Reads โ Load balancer โ Replica 1, Replica 2, Replica 3
-- Connection routing in your application:
-- write_pool = connect("primary_host:5432")
-- read_pool = connect("replica_lb:5432")
-- Route queries:
-- INSERT/UPDATE/DELETE โ write_pool
-- SELECT โ read_pool
-- โ ๏ธ CAUTION: Replica lag means reads might be stale
-- After a write, immediately reading from replica may not see the change
--
...๐ Quick Reference
| Type | Data Loss | Write Speed |
|---|---|---|
| Async | Possible (lag) | Fastest |
| Sync (on) | None (flushed) | Slower |
| Sync (apply) | None (applied) | Slowest |
| Logical | Possible | Fast (selective) |
๐ Lesson Complete!
You now understand replication topologies and failover strategies. Next, explore advanced views and virtual tables!
Sign up for free to track which lessons you've completed and get learning reminders.