Database - Backup and Recovery
Recovery is how a DBMS restores data to a consistent state after failures. Three types of failures exist:
- Transaction failure — A transaction aborts (constraint violation, deadlock, application error). Handled by rollback using the transaction log.
- System crash — Power failure, OS crash, hardware fault. The ARIES Recovery Algorithm replays committed transactions (redo) and undoes uncommitted ones (undo) from the WAL.
- Media failure — Disk corruption or loss. Requires restoring from a backup and replaying the WAL from that point forward.
Backup Strategies
| Strategy | Description | Recovery Time | Storage Cost |
|---|---|---|---|
| Full backup | Complete copy of the entire database | Fastest restore | Highest |
| Incremental | Only changes since last backup (any type) | Slower (chain of restores) | Lowest |
| Differential | Only changes since last full backup | Medium | Medium |
| Continuous archiving | WAL shipping / point-in-time recovery | Any point in time | WAL storage |
PostgreSQL pg_admin Backup
Take a backup of the Database using pg admin in the custom format, in the data options select
- do not save
owner - do not save
namespace - do not save
privileges
Restore
Select the db you want to restore and choose restore, select
- do not save
owner - do not save
namespace - do not save
privileges
after sometime it should be done
[! Caution]
make sure you only restore one Database at a time or backup one Database at a time
IMPORTANT
if you have any database triggers, make sure you point to the right schema after you dump the Database
Command-Line Backup
# Logical backup (SQL dump)
pg_dump -Fc mydb > mydb.dump # Custom format (compressed, flexible)
pg_dump -Fp mydb > mydb.sql # Plain SQL
# Restore
pg_restore -d mydb mydb.dump
# Physical backup (file-level, for point-in-time recovery)
pg_basebackup -D /backup/dir -Ft -z -PBacking up and Restoring between Versions
When you are making backups and restore between different versions of the postgres servers, you have to be very careful regarding this. As this might results in header error.
So this happened when i was doing a backup and restore between version 16 (local) and 15 production.
to restore this onto a 15 version you have to do the following:
- make a dump from the postgres 16 remote server
- make a restore of the file onto a postgres 15 version from the same backup host
- make a backup file from the 15 remote server
- make restore on the postgres 15 server remotely
USE IDENTICAL VERSIONS
Make sure to use the same version of postgres on development and production and test databases
Point-in-Time Recovery (PITR)
PostgreSQL supports recovering to any specific moment using continuous WAL archiving:
- Take a base backup with
pg_basebackup - Configure
archive_commandto ship WAL files to a safe location - To recover: restore the base backup, configure
recovery_target_time, start PostgreSQL — it replays WAL up to that exact moment
This is the production-grade recovery strategy for any serious deployment.
Recovery in Other Systems
- MySQL / InnoDB — Uses redo log (WAL) + undo log + binary log.
mysqlbackupormysqldumpfor backups. Binary log replay for point-in-time recovery. - SQLite — Rollback journal (default) or WAL mode. The entire database is a single file — backup is a file copy (use
.backupcommand or SQLite Online Backup API for consistency).
Recovery in Distributed Databases
Distributed environments add complexity because failures can be partial (one node crashes while others continue):
Replication-Based Recovery
| Strategy | How It Works | Failover Time |
|---|---|---|
| Active replication | All replicas process transactions concurrently | Immediate — any replica can take over |
| Passive replication | Primary processes transactions, replicates to standby | Brief delay — standby must catch up from replication log |
Distributed Transaction Recovery
When a transaction spans multiple nodes, recovery must coordinate across all of them:
- Two-Phase Commit (2PC) ensures all nodes either commit or roll back
- Transaction logs at each site enable local redo/undo
- Checkpointing across sites reduces log replay time (see ARIES Recovery Algorithm)
Data Synchronization After Recovery
After a failed node comes back online:
- Log replay — Apply all missed WAL entries from the healthy replica
- Snapshot + incremental — Ship a full snapshot to the recovered node, then replay only changes since the snapshot
- Anti-entropy — Background process comparing data between replicas and resolving differences (used by Cassandra, DynamoDB)