Quick anecdote: we once watched a single compromised admin account delete both the VM snapshots and the cloud backups in under 7 minutes. Heart-stopping, but instructive — backups that can be deleted are just expensive illusions.
A few things that actually make a difference (no unicorns, just stuff that survives a real attack):
- Assume compromise. Design your backups so an attacker who owns one host can’t nuke every copy.
- 3–2–1–1 (practical variant): 3 copies, 2 media types, 1 offsite, +1 immutable/offline. Immutable snapshots or WORM-object storage are non-negotiable for ransomware defense.
- Separate the control plane. Backup credentials, snapshot APIs, and orchestration should live on a different network/ACL set than production VMs/containers. Treat the backup management host like a high-value vault.
- Air-gap or logical air-gap. For physical machines, keep an occasional offline image; for VMs, export periodic full VM images to a physically isolated system. Cloud: use Object Lock / retention policies (S3 OLB-style) and a separate billing/account boundary.
- Limit blast radius in virtualization: don’t let VM admins also be backup admins by default. Use least privilege for snapshot/delete actions; log and alert every snapshot deletion.
- Immutable + verifiable: generate an SBOM-like catalog of backups (checksums, manifests) and store it offsite. Verify restores regularly — a backup you can’t restore is just noise.
- Layer defenses: MFA + delegated service accounts + JIT access for backup operations. If possible, require approvals for destructive backup ops (deletions, retention reductions).
- RPO/RTO with reality checks: aim small RPOs for critical DBs (PITR + WAL archiving) and pragmatic RTOs for bulk file systems — you’ll often accept longer recovery for terabytes of cold data.
- Automation, but with human safeties: automations should create backups and alerts; destructive ops require human confirmation or multi-sig.
- Practice = credibility: schedule restore drills that include ransomware scenarios (restore to new network, validate integrity, exercise key rotations).
If you operate hybrid (bare-metal + cloud + containers), treat each class differently but keep the same invariants: isolation, immutability, verification. No single trick saves you — it’s the stack: policy, network isolation, immutable storage, audits, and rehearsals.
Short, blunt takeaway: backups you can’t verify or can’t protect aren’t backups — they’re liabilities.