Database - Backup and Recovery

Recovery is how a DBMS restores data to a consistent state after failures. Three types of failures exist:

  • Transaction failure — A transaction aborts (constraint violation, deadlock, application error). Handled by rollback using the transaction log.
  • System crash — Power failure, OS crash, hardware fault. The ARIES Recovery Algorithm replays committed transactions (redo) and undoes uncommitted ones (undo) from the WAL.
  • Media failure — Disk corruption or loss. Requires restoring from a backup and replaying the WAL from that point forward.

Backup Strategies

StrategyDescriptionRecovery TimeStorage Cost
Full backupComplete copy of the entire databaseFastest restoreHighest
IncrementalOnly changes since last backup (any type)Slower (chain of restores)Lowest
DifferentialOnly changes since last full backupMediumMedium
Continuous archivingWAL shipping / point-in-time recoveryAny point in timeWAL storage

PostgreSQL pg_admin Backup

Take a backup of the Database using pg admin in the custom format, in the data options select

  • do not save owner
  • do not save namespace
  • do not save privileges

Restore

Select the db you want to restore and choose restore, select

  • do not save owner
  • do not save namespace
  • do not save privileges

after sometime it should be done

[! Caution]
make sure you only restore one Database at a time or backup one Database at a time

IMPORTANT

if you have any database triggers, make sure you point to the right schema after you dump the Database

Command-Line Backup

# Logical backup (SQL dump)
pg_dump -Fc mydb > mydb.dump        # Custom format (compressed, flexible)
pg_dump -Fp mydb > mydb.sql         # Plain SQL
 
# Restore
pg_restore -d mydb mydb.dump
 
# Physical backup (file-level, for point-in-time recovery)
pg_basebackup -D /backup/dir -Ft -z -P

Backing up and Restoring between Versions

When you are making backups and restore between different versions of the postgres servers, you have to be very careful regarding this. As this might results in header error.

So this happened when i was doing a backup and restore between version 16 (local) and 15 production.

to restore this onto a 15 version you have to do the following:

  • make a dump from the postgres 16 remote server
  • make a restore of the file onto a postgres 15 version from the same backup host
  • make a backup file from the 15 remote server
  • make restore on the postgres 15 server remotely

USE IDENTICAL VERSIONS

Make sure to use the same version of postgres on development and production and test databases

Point-in-Time Recovery (PITR)

PostgreSQL supports recovering to any specific moment using continuous WAL archiving:

  1. Take a base backup with pg_basebackup
  2. Configure archive_command to ship WAL files to a safe location
  3. To recover: restore the base backup, configure recovery_target_time, start PostgreSQL — it replays WAL up to that exact moment

This is the production-grade recovery strategy for any serious deployment.

Recovery in Other Systems

  • MySQL / InnoDB — Uses redo log (WAL) + undo log + binary log. mysqlbackup or mysqldump for backups. Binary log replay for point-in-time recovery.
  • SQLite — Rollback journal (default) or WAL mode. The entire database is a single file — backup is a file copy (use .backup command or SQLite Online Backup API for consistency).

Recovery in Distributed Databases

Distributed environments add complexity because failures can be partial (one node crashes while others continue):

Replication-Based Recovery

StrategyHow It WorksFailover Time
Active replicationAll replicas process transactions concurrentlyImmediate — any replica can take over
Passive replicationPrimary processes transactions, replicates to standbyBrief delay — standby must catch up from replication log

Distributed Transaction Recovery

When a transaction spans multiple nodes, recovery must coordinate across all of them:

Data Synchronization After Recovery

After a failed node comes back online:

  1. Log replay — Apply all missed WAL entries from the healthy replica
  2. Snapshot + incremental — Ship a full snapshot to the recovered node, then replay only changes since the snapshot
  3. Anti-entropy — Background process comparing data between replicas and resolving differences (used by Cassandra, DynamoDB)