Operations

Backups & Restore

pgBackRest integration for automated backups, WAL archiving, snapshots, and point-in-time recovery in AxiomDB

Backups & Restore

AxiomDB uses pgBackRest for enterprise-grade backup and recovery. This guide covers the full backup lifecycle: daily full backups, hourly incremental backups, WAL archiving, named snapshots, and restore procedures.


Backup Strategy

AxiomDB implements a 3-tier backup strategy:

┌─────────────────────────────────────────────────────────────┐
│                    AxiomDB Backup Pipeline                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Tier 1: WAL Archiving (Continuous)                         │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ PostgreSQL → WAL files → pgBackRest archive-push    │    │
│  │ RPO: ~0 (seconds of data loss)                      │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  Tier 2: Incremental Backups (Hourly)                       │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ pgBackRest --type=diff (changed blocks only)        │    │
│  │ RPO: 1 hour                                         │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  Tier 3: Full Backups (Daily)                               │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ pgBackRest --type=full (complete database copy)     │    │
│  │ RPO: 24 hours (but WAL covers gaps)                 │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  Named Snapshots (On-Demand)                                │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ pgBackRest restore point markers                    │    │
│  │ Used for pre-migration safety nets                  │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Recovery Point Objective (RPO)

Backup TypeFrequencyRPOStorage CostRestore Speed
WAL ArchiveContinuousSecondsMediumSlow (replay)
IncrementalHourly1 hourLowMedium
FullDaily24 hoursHighFast
SnapshotOn-demandPoint-in-timeLowFast

pgBackRest Configuration

Stanza Configuration

The pgBackRest stanza defines connection details, backup behavior, and repository location.

# /etc/pgbackrest/pgbackrest.conf

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=7
repo1-retention-diff=24
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=${BACKREST_CIPHER_PASS}
repo1-bundle=y
repo1-bundle-limit=2048MB
compress-type=zst
compress-level=6
process-max=4
log-level-console=info
log-level-file=detail
start-fast=y
delta=y

[axiomdb]
pg1-path=/var/lib/postgresql/14/main
pg1-port=5432
pg1-user=postgres
pg1-socket-path=/var/run/postgresql
archive-command='pgbackrest --stanza=axiomdb archive-push %p'
archive-mode=y

Encryption

All backups are encrypted at rest using AES-256-CBC. The cipher passphrase is stored in the AxiomDB secrets manager and injected via environment variable. Never store the passphrase in plaintext configuration files.

Initialize the Stanza

# Create the stanza
pgbackrest --stanza=axiomdb stanza-create

# Verify configuration
pgbackrest --stanza=axiomdb check

PostgreSQL WAL Archive Setup

-- postgresql.conf settings
ALTER SYSTEM SET archive_mode = 'on';
ALTER SYSTEM SET archive_command = 'pgbackrest --stanza=axiomdb archive-push %p';
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_senders = 5;

-- Reload configuration
SELECT pg_reload_conf();

Archive Mode Requires Restart

Changing archive_mode requires a PostgreSQL restart, not just a reload. Plan this during a maintenance window.


Running Backups

Manual Backup Commands

# Full backup (complete copy of all databases)
pgbackrest --stanza=axiomdb --type=full backup

# Incremental backup (only changed blocks since last backup)
pgbackrest --stanza=axiomdb --type=diff backup

# Differential backup (changes since last full backup)
pgbackrest --stanza=axiomdb --type=incr backup

Automated Backup Schedule

# /etc/cron.d/pgbackrest

# Full backup daily at 2:00 AM
0 2 * * * postgres pgbackrest --stanza=axiomdb --type=full backup

# Incremental backup every hour
0 * * * * postgres pgbackrest --stanza=axiomdb --type=diff backup

# Verify backup integrity daily at 4:00 AM
0 4 * * * postgres pgbackrest --stanza=axiomdb verify

Backup Output

2025-01-15 02:00:01.000 P00   INFO: backup command begin 2.51: ...
2025-01-15 02:00:01.100 P00   INFO: execute exclusive pg_start_backup()
2025-01-15 02:00:02.200 P00   INFO: backup start = 0/4000028, lsn = 0/4000028
2025-01-15 02:00:05.500 P00   INFO: check archive for segment 0/4000028
2025-01-15 02:05:30.000 P00   INFO: new backup label = 20250115-020001F
2025-01-15 02:05:31.000 P00   INFO: full backup size = 2.1GB
2025-01-15 02:05:31.100 P00   INFO: new backup size = 2.1GB, file total = 1847
2025-01-15 02:05:31.200 P00   INFO: backup command end: completed successfully

Backup Status Check

# View all backups
pgbackrest --stanza=axiomdb info

# Example output:
# stanza: axiomdb
#     status: ok
#     cipher: aes-256-cbc
#     db (current)
#         wal archive min/max (14): 000000010000000000000001/000000010000000000000042
#
#         full backup: 20250115-020001F
#             timestamp start/stop: 2025-01-15 02:00:01+00 / 2025-01-15 02:05:31+00
#             wal start/stop: 000000010000000000000001 / 000000010000000000000001
#             database size: 2.1GB, database backup size: 2.1GB
#             repository size: 650MB, repository backup size: 648MB
#
#         diff backup: 20250115-020001F_20250115-030000D
#             timestamp start/stop: 2025-01-15 03:00:00+00 / 2025-01-15 03:01:15+00
#             wal start/stop: 000000010000000000000002 / 000000010000000000000002
#             database size: 2.1GB, database backup size: 120MB
#             repository size: 652MB, repository backup size: 45MB

Named Snapshots

Snapshots are named restore points that allow you to restore to a specific moment, typically used before risky operations like schema migrations.

Creating a Snapshot

# Create a named snapshot before a migration
pgbackrest --stanza=axiomdb --type=full \
    --annotation=pre-migration-v2.3 \
    --annotation=branch=main \
    backup
-- Also create a PostgreSQL restore point for WAL-level recovery
SELECT pg_create_restore_point('pre-migration-v2.3');

Listing Snapshots

# List all backups (including annotated snapshots)
pgbackrest --stanza=axiomdb info --output=json
{
  "stanza": "axiomdb",
  "status": {
    "code": 0,
    "message": "ok"
  },
  "db": [
    {
      "id": 1,
      "system-id": 7312345678901234567,
      "version": 14
    }
  ],
  "backup": [
    {
      "type": "full",
      "label": "20250115-020001F",
      "timestamp": {
        "start": 1736905201,
        "stop": 1736905531
      },
      "annotation": {
        "pre-migration-v2.3": "true",
        "branch": "main"
      },
      "info": {
        "size": 2254857830,
        "delta": 2254857830
      }
    }
  ]
}

Snapshot Retention

# Keep specific snapshots indefinitely by using archive retention
pgbackrest --stanza=axiomdb expire

# Manual retention override
pgbackrest --stanza=axiomdb --repo1-retention-full=30 expire

Snapshot Best Practices

Always create a snapshot before: (1) major schema migrations, (2) bulk data operations, (3) dependency upgrades, (4) production deployments. Name snapshots descriptively: pre-migration-{version}, pre-deploy-{sha}, pre-bulk-update-{date}.


Restore Procedures

Restore into a New Branch (Preferred)

The safest restore method creates a new branch database from the backup, leaving the original intact.

# Step 1: Create a new branch via AxiomDB Gateway
curl -X POST http://127.0.0.1:4060/api/branches \
  -H "Content-Type: application/json" \
  -d '{
    "name": "restored-main-20250115",
    "source_branch": "main",
    "restore_point": "20250115-020001F"
  }'
# Step 2: Verify the restored branch
psql -h 127.0.0.1 -p 5432 -U axiomdb_restored_main_20250115 \
  -d restored_main_20250115 -c "SELECT count(*) FROM information_schema.tables;"

Point-in-Time Recovery (PITR)

Restore to a specific timestamp using WAL replay:

# Stop PostgreSQL
sudo systemctl stop postgresql

# Restore the base backup
pgbackrest --stanza=axiomdb \
    --type=time \
    --target="2025-01-15 14:30:00+00" \
    --target-action=promote \
    restore

# Start PostgreSQL
sudo systemctl start postgresql

# Verify
psql -h 127.0.0.1 -p 5432 -U axiomdb -c "SELECT now();"

Restore to Named Restore Point

pgbackrest --stanza=axiomdb \
    --type=name \
    --target="pre-migration-v2.3" \
    --target-action=promote \
    restore

Restore Specific Database Only

# Restore only a specific database from the backup
pgbackrest --stanza=axiomdb \
    --type=full \
    --target-db-name=branch_xyz \
    --db-include=branch_xyz \
    restore

Restore Checklist

Before initiating any restore:

□ Identify the exact restore point (timestamp, backup label, or snapshot name)
□ Confirm the target branch name and whether to create a new branch
□ Verify backup availability: pgbackrest --stanza=axiomdb info
□ Check available disk space: df -h /var/lib/postgresql
□ Notify stakeholders of expected downtime (if restoring in-place)
□ Document the restore reason in the operations log
□ Test restore on a non-production branch first (if time permits)

After restore completion:

□ Verify database connectivity: psql -c "SELECT 1;"
□ Check table counts match expectations
□ Run application health checks
□ Verify recent migrations are present: SELECT * FROM _prisma_migrations ORDER BY started_at DESC LIMIT 10;
□ Test critical application endpoints
□ Monitor error rates for 30 minutes
□ Update status page / notify stakeholders
□ Document the restore outcome

Backup Verification

Integrity Checks

# Verify backup integrity (checks checksums)
pgbackrest --stanza=axiomdb verify

# Check for corruption in the repository
pgbackrest --stanza=axiomdb repo-get latest/base/backup.manifest

Test Restore

# Restore to a temporary cluster for testing
pgbackrest --stanza=axiomdb \
    --type=full \
    --pg1-path=/tmp/test-restore \
    --target-action=promote \
    restore

# Start temporary instance
pg_ctl -D /tmp/test-restore start -o "-p 15432"

# Verify
psql -h 127.0.0.1 -p 15432 -U axiomdb -c "SELECT count(*) FROM information_schema.tables;"

# Cleanup
pg_ctl -D /tmp/test-restore stop
rm -rf /tmp/test-restore

Automated Verification Script

#!/bin/bash
# /opt/axiomdb/scripts/verify-backups.sh

set -euo pipefail

STANZA="axiomdb"
LOG_FILE="/var/log/axiomdb/backup-verify.log"
WEBHOOK_URL="https://hooks.slack.com/services/xxx"

log() { echo "[$(date -Iseconds)] $*" | tee -a "$LOG_FILE"; }

# Check backup freshness
LAST_BACKUP=$(pgbackrest --stanza="$STANZA" info --output=json | \
    jq -r '.backup[-1].timestamp.start')
BACKUP_AGE=$(( $(date +%s) - LAST_BACKUP ))

if [ "$BACKUP_AGE" -gt 90000 ]; then  # 25 hours
    log "CRITICAL: Last backup is $((BACKUP_AGE / 3600)) hours old"
    curl -s -X POST "$WEBHOOK_URL" -d "{\"text\":\"🔴 Backup is $((BACKUP_AGE / 3600))h old\"}"
    exit 1
fi

# Verify backup integrity
if ! pgbackrest --stanza="$STANZA" verify >> "$LOG_FILE" 2>&1; then
    log "CRITICAL: Backup verification failed"
    curl -s -X POST "$WEBHOOK_URL" -d "{\"text\":\"🔴 Backup verification failed\"}"
    exit 1
fi

log "OK: Backup verified, age $((BACKUP_AGE / 3600))h"
exit 0

Disaster Recovery

Full Site Recovery

If the entire VPS is lost:

  1. Provision a new VPS with the same OS and disk layout
  2. Install PostgreSQL, PgBouncer, pgBackRest, and AxiomDB components
  3. Restore pgBackRest configuration and cipher key
  4. Run pgbackrest --stanza=axiomdb --type=latest restore
  5. Start PostgreSQL and verify data integrity
  6. Restart AxiomDB Gateway and Ops Console
  7. Update DNS if needed

Recovery Time Objective (RTO)

ScenarioExpected RTONotes
Single database restore5–15 minNew branch from backup
Point-in-time recovery15–30 minWAL replay required
Full site recovery1–2 hoursRequires infrastructure rebuild
Cross-region restore2–4 hoursNetwork transfer time

Test Your DR Plan

Run a full disaster recovery drill quarterly. Document the results, time taken, and any issues encountered. A backup that has never been tested is not a backup.

On this page