Skip to main content
    Back

    Advanced Track

    Backup & Restore

    By the end of this lesson you'll be able to design a backup plan you can actually bet your job on — choosing logical vs physical backups, setting up point-in-time recovery, picking a schedule from real RTO/RPO targets, and proving it works by restoring it. The hard truth of this lesson: a backup you have never restored isn't a backup — it's a wish.

    What You'll Learn

    • Logical backups with pg_dump / mysqldump
    • Physical backups: file copies, snapshots, pg_basebackup
    • Full vs incremental vs differential backups
    • Point-in-time recovery with the WAL / binlog
    • RTO vs RPO — the two numbers that drive every choice
    • Retention, offsite copies, and the 3-2-1 rule

    Why this matters

    A database backup is the seatbelt of your data. Hardware dies, a bad migration drops a table, a tired engineer types DELETE without a WHERE — and the only thing standing between that and a closed business is a backup you can restore. This lesson is about making sure the seatbelt is actually buckled, not just hanging in the car.

    1. Logical Backups (pg_dump / mysqldump)

    A logical backup exports your database as the SQL needed to rebuild it — the CREATE TABLEs and INSERTs, in a file you could open in a text editor. Because it's just SQL, it's portable: you can restore it onto a newer database version, a different server, even a different operating system. The trade-off is speed — rebuilding a huge database row by row is slower than copying files.

    📸 Real-world analogy

    A logical backup is like photocopying every page of a book. Slow for a 1,000-page book, but you can read any page, restore a single chapter, and the copy works in any library. A physical backup (next section) is like cloning the whole bookshelf — fast, but it only fits the same shelf.

    pg_dump & mysqldump

    Export a database to a portable file.

    Try it Yourself »
    SQL
    -- LOGICAL BACKUP: export the database as SQL you could read in a text editor.
    -- Portable across versions and even across machines. Run these in a terminal,
    -- NOT in the SQL editor — they are command-line tools, not SQL statements.
    
    -- PostgreSQL: pg_dump backs up ONE database --------------------------------
    pg_dump -U postgres -d shopdb -F custom -f shopdb.dump
    --      \__ user    \__ db    \__ format   \__ output file
    -- -F custom  = compressed binary (recommended: smaller + parallel restor
    ...

    Restoring a logical backup

    Load the dump back with pg_restore / psql / mysql.

    Try it Yourself »
    SQL
    -- RESTORE a logical backup = run the dump back into a database.
    
    -- PostgreSQL custom format -> use pg_restore:
    createdb shopdb_restored                       -- make an empty target first
    pg_restore -U postgres -d shopdb_restored shopdb.dump
    
    -- PostgreSQL plain .sql text -> just pipe it through psql:
    psql -U postgres -d shopdb_restored < shopdb.sql
    
    -- MySQL -> pipe the .sql file into the mysql client:
    mysql -u root -p shopdb_restored < shopdb.sql
    
    -- ✅ Result: shopdb_restored now contains ev
    ...

    Your Turn: complete the dump & restore

    Fill in the four blanks so the commands back up shopdb and restore it into shopdb_test. The expected result is in the comments so you can check yourself.

    🎯 Your Turn: pg_dump → pg_restore

    Replace the ___ blanks with the names and the tool.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — complete the backup, then complete the restore.
    -- Goal: dump the "shopdb" PostgreSQL database to a compressed file,
    --       then restore that file into a fresh database called "shopdb_test".
    
    -- (a) Back it up — fill in the database name and the output file:
    pg_dump -U postgres -d ___ -F custom -f ___    -- 👉 shopdb   and   shopdb.dump
    
    -- (b) Restore it — fill in the tool and the target database:
    createdb shopdb_test
    ___ -U postgres -d ___ shopdb.dump             -- 👉 pg_r
    ...

    2. Physical Backups (files & snapshots)

    A physical backup copies the raw data files on disk instead of regenerating SQL. PostgreSQL's pg_basebackup grabs the entire data directory; cloud platforms take a snapshot — a frozen image of the disk. This is dramatically faster for large databases and it's the foundation for point-in-time recovery. The catch: it's version-locked (restore onto the same database version) and it's all-or-nothing — you can't cherry-pick a single table out of it.

    pg_basebackup & snapshots

    Copy data files and cloud disk snapshots.

    Try it Yourself »
    SQL
    -- PHYSICAL BACKUP: copy the actual data FILES on disk, not SQL statements.
    -- Much faster for large databases, and it is the foundation for PITR (below).
    
    -- PostgreSQL: pg_basebackup copies the whole data directory + WAL:
    pg_basebackup -h primary -U repl_user -D /backup/base \
      --checkpoint=fast --wal-method=stream -P
    --                   \__ -P prints a live progress bar
    
    -- Cloud snapshots are physical backups too (a frozen image of the disk):
    -- AWS    -> EBS / RDS snapshot
    -- GCP    -> Pe
    ...

    3. Full vs Incremental vs Differential

    A full backup copies everything. To avoid doing that every single night, you layer smaller backups on top:

    • Full — the complete database. Slowest to make, simplest and fastest to restore (one file).
    • Incremental — only what changed since the last backup of any kind. Tiny and quick, but to restore you must replay the full plus every incremental in order.
    • Differential — everything that changed since the last full. Bigger than an incremental, but restore needs only the full + the one latest differential.

    Result — Mon full, then nightly — what each Thursday restore needs:

    StrategyThu backup sizeTo restore Thu, replay…
    Full every nightWhole DBThu full only
    Full + incrementalSmallestFull + Tue + Wed + Thu
    Full + differentialMediumFull + Thu differential

    Continuous WAL/binlog archiving (next) is the ultimate "incremental" — it captures every change as it happens, which is what makes second-by-second recovery possible.

    4. Point-in-Time Recovery (WAL / binlog)

    Every change a database makes is first written to a transaction log — PostgreSQL calls it the WAL (Write-Ahead Log), MySQL the binlog (binary log). If you keep a base backup and archive that log, you can restore the base and then replay the log up to any moment you choose. That's point-in-time recovery (PITR): a rewind button for your data.

    💡 Pro Tip — PITR is your "undo the disaster" button

    If someone runs DELETE FROM users at 14:35, you restore last night's base backup and replay the log to 14:34:59 — the second before the mistake. You lose only that one statement, not the whole day. This is why production databases should always archive their WAL/binlog.

    WAL / binlog archiving + PITR

    Enable continuous archiving and rewind to a timestamp.

    Try it Yourself »
    SQL
    -- POINT-IN-TIME RECOVERY (PITR): rewind to ANY second in the past.
    -- Built from a physical base backup PLUS the transaction log that records
    -- every change after it: Postgres calls it the WAL, MySQL calls it the binlog.
    
    -- 1) Turn on continuous log archiving (PostgreSQL, postgresql.conf):
    wal_level = replica
    archive_mode = on
    archive_command = 'cp %p /wal_archive/%f'   -- ship each WAL segment offsite
    
    -- 2) To recover, restore the base backup, then replay logs up to a moment:
    restore_comman
    ...

    5. RTO vs RPO — the two numbers that decide everything

    Before you pick any backup method, the business has to answer two questions. Their answers drive every choice that follows:

    • RPO — Recovery Point Objective: how much data can you afford to lose, measured in time? A nightly dump means up to 24 hours of work could vanish. WAL archiving brings RPO down to minutes or seconds.
    • RTO — Recovery Time Objective: how long can you be down while you recover? Restoring a 100 GB dump might take an hour (big RTO); failing over to a hot standby replica takes seconds.

    Easy way to remember: RPO = how much data you lose (looking back to the last good copy). RTO = how long until you're running again. Tighter targets cost more — match the spend to what the data is actually worth.

    Result — typical RPO / RTO by method:

    MethodRPO (data lost)RTO (downtime)
    Daily pg_dump / mysqldumpUp to 24hMinutes–hours
    WAL / binlog + PITRSeconds–minutes15–60 min
    Hot standby replicaNear zeroSeconds
    Weekly cloud snapshotUp to 7 daysMinutes

    Your Turn: pick the strategy for the targets

    Read the RTO/RPO requirement in the comments and choose the one option (A, B, or C) that meets both. The reasoning and answer are in the comments.

    🎯 Your Turn: match the strategy

    Replace ___ with A, B, or C.

    Try it Yourself »
    SQL
    -- 🎯 YOUR TURN — pick the right strategy for the requirement.
    -- A payments app says: "We can lose AT MOST 1 minute of data (RPO = 1 min)
    -- and must be back online within 5 minutes (RTO = 5 min)."
    
    -- Which ONE approach meets BOTH targets? Replace ___ with: A, B, or C.
    --   A) A single pg_dump every night at 2 AM
    --   B) WAL/binlog archiving + a hot standby replica for fast failover
    --   C) A weekly cloud disk snapshot
    
    my_choice = ___    -- 👉 think: nightly dump loses up to 24h; snapshot los
    ...

    6. The Golden Rule: test your restores

    Here is the rule every senior engineer learns the hard way: a backup you haven't restored isn't a backup. Backups fail silently — a cron job that quietly errored, a corrupt dump, a missing WAL segment, an archive bucket that filled up months ago. You only find out the moment you desperately need it, which is the worst possible time. So you practise: on a schedule, restore into a throwaway database and verify it.

    A good restore test answers three things: did it restore at all, is the data complete (compare row counts against production), and how long did it take (that number is your real RTO). The recovery scenario below shows a clean restore test passing.

    Result — monthly restore test — row counts must match production:

    TableProductionRestoredMatch?
    users5000050000
    orders250000250000
    products50005000

    Common Errors (and the fix)

    • Never testing restores. The single most common — and most expensive — mistake. The first time you discover a backup is corrupt should not be during an outage. Fix: schedule a monthly restore into a scratch database with row-count checks.
    • No offsite copy. Storing every backup on the same server (or same region) as the database means one fire, flood, or account deletion wipes out both. Fix: follow 3-2-1 — ship at least one copy to a different region or provider.
    • Ignoring RPO. "We back up nightly" sounds safe until a crash at 5 PM loses a full day of orders. Fix: set an explicit RPO with the business and add WAL/binlog archiving if minutes-not-hours matter.
    • Backing up without a consistent snapshot. Copying live files with cp, or a long mysqldump without --single-transaction, captures a half-written, unrestorable state. Fix: use pg_basebackup / a consistent snapshot, or mysqldump --single-transaction.
    • "pg_restore: error: could not connect / database does not exist". pg_restore -d needs the target database to already exist. Fix: run createdb shopdb_restored first, then restore into it.

    📘 Quick Reference

    Command / TermWhat it does
    pg_dump -Fc -f f.dump dbLogical backup of one Postgres DB (compressed)
    mysqldump --single-transaction dbConsistent logical backup of a MySQL DB
    pg_restore -d db f.dumpRestore a custom-format Postgres dump
    psql -d db < f.sqlRestore a plain .sql dump (Postgres)
    pg_basebackup -D dirPhysical copy of the whole data directory
    WAL / binlogTransaction log enabling point-in-time recovery
    RPOMax acceptable data loss (in time)
    RTOMax acceptable downtime to recover
    3-2-13 copies, 2 media, 1 offsite

    Frequently Asked Questions

    Q: Logical or physical — which should I use?

    Use both. Logical (pg_dump/mysqldump) is portable and great for migrations and grabbing one table; physical (pg_basebackup/snapshots) is fast for large databases and is required for point-in-time recovery. Many shops run a daily logical dump and continuous WAL archiving.

    Q: How often should I back up?

    Work backwards from your RPO. If you can lose 24 hours, a nightly full is fine. If you can only lose minutes, you need continuous WAL/binlog archiving on top of periodic full backups.

    Q: Isn't a read replica the same as a backup?

    No. A replica copies changes in real time — including the bad ones. DROP TABLE users on the primary instantly drops it on the replica too. A replica protects against hardware failure (great RTO); a backup protects against mistakes and corruption. You need both.

    Q: Do cloud-managed databases (RDS, Cloud SQL) back up automatically?

    They offer automated snapshots and PITR, but the defaults (retention window, region) are yours to configure — and you still must test that you can actually restore. Managed ≠ hands-off.

    Mini-Challenge: Design a Backup Plan

    Put it all together — a brief, a blank outline, and a sample passing answer in the comments. Write your plan as comments, then sanity-check it against the RTO, RPO, retention, and offsite requirements.

    🎯 Mini-Challenge

    Design a schedule that meets RPO 5m, RTO 30m, 7-year retention, and 3-2-1.

    Try it Yourself »
    SQL
    -- 🎯 MINI-CHALLENGE: design a backup PLAN (write it as comments — no live SQL).
    -- Requirements for an online store "shopdb":
    --   • RPO = 5 minutes   (lose at most 5 minutes of orders)
    --   • RTO = 30 minutes  (back online within half an hour)
    --   • Keep 7 years of monthly records for tax/audit (retention)
    --   • Survive losing the whole data centre (offsite / 3-2-1)
    --
    -- Sketch the schedule. Decide, for each line, the TYPE, FREQUENCY and RETENTION:
    --   1. Continuous log archiving  -> frequ
    ...

    🎉 Lesson Complete

    • Logical backups (pg_dump/mysqldump) are portable; physical (pg_basebackup/snapshots) are fast
    • Full is simplest to restore; incremental/differential save space at restore-time cost
    • PITR replays the WAL/binlog to rewind to any second before a disaster
    • RPO is how much data you can lose; RTO is how long you can be down — they drive every choice
    • ✅ A backup you haven't restored isn't a backup — test restores, set retention, keep 3-2-1 offsite copies
    • Next: Replication — keep a live standby for near-zero RTO

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service