Ransomware Reality Check: Backups That Actually Save You

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

“We have backups” is not a recovery plan. Ransomware doesn’t just encrypt files—it often hunts for backup repositories, deletes restore points, steals data, and leaves you with untrusted systems. This guide is a ransomware reality check: what “good backups” actually mean, how to design them to survive an attacker, and how to test restores so you can recover with confidence (not hope).

Quickstart

If you only have an hour or two, do these high-leverage moves first. The goal is simple: make sure you have at least one backup an attacker can’t erase, and prove you can restore it.

1) Set your recovery targets (RPO/RTO)

Without targets, “backup frequency” is guesswork. Decide what you can afford to lose and how fast you must be back.

RPO: max data loss you can tolerate (e.g., 4 hours)
RTO: max downtime you can tolerate (e.g., 8 hours)
Write targets per system: DB, file shares, SaaS, laptops
Make tradeoffs explicit (cost vs speed)

2) Enforce 3-2-1-1-0 (the survivable baseline)

“3-2-1” is good. “3-2-1-1-0” is better for ransomware: add an immutable/offline copy and verify.

3 copies of data (production + 2 backups)
2 different media / storage types
1 offsite copy (separate blast radius)
1 immutable/offline copy (can’t be deleted)
0 errors (verified restores, not assumptions)

3) Split identities: backups must not share admin keys

Ransomware commonly wins by stealing privileged credentials. Backup systems need separate accounts and least privilege.

Dedicated backup account + MFA
No domain admin for backup operators
Backup storage credentials can’t delete (immutability)
Break-glass account stored offline

4) Run one real restore test this week

The only backup that matters is the one you can restore under pressure. Do one “boring” restore test and document it as a runbook.

Pick a representative system (DB or file share)
Restore into an isolated environment
Validate with checksums/app-level health checks
Record time + steps + missing dependencies

The litmus test

If an attacker with your admin credentials can delete every restore point, you don’t have a ransomware-ready backup. You have a convenience copy.

Overview

Ransomware is a business problem disguised as malware: attackers aim to stop operations and pressure you into paying. Backups are your best “no” — but only if they’re designed for an adversary who can log in, not just a disk that fails.

What this post covers

What “good backups” mean in a ransomware threat model (not just hardware failure)
How to design survivable backup architecture (immutability, separation, offsite)
How to protect backup identities and stop attackers from deleting restore points
How to test restores and verify integrity (so recovery isn’t a surprise)
A cheatsheet + a quiz to lock in the concepts

Backup capability	What it protects against	Common failure mode
Versioned backups	Accidental deletes, corruption, some ransomware	Too-short retention; encrypted files replace clean versions
Offsite copy	Site disaster, local compromise blast radius	Offsite is still reachable with stolen credentials
Immutability / WORM	Credentialed attackers deleting backups	Misconfigured delete permissions; “immutable” not enforced
Restore testing	Unknown unknowns (missing keys, bad scripts)	Tests are skipped; only “backup jobs succeeded” is monitored

Read this like a runbook

As you read, imagine: “An attacker has admin access on Friday night.” For each step, ask: Can they erase my last good copy? If yes, adjust.

Core concepts

1) The ransomware backup threat model (why “it’s on the network” matters)

Traditional backups were designed for accidents: disk failure, bad deployments, human mistakes. Ransomware adds an adversary who tries to destroy recovery. That usually looks like:

Steal privileged credentials (phishing, token theft, lateral movement)
Disable backup agents or delete jobs/snapshots
Delete or encrypt backup repositories and catalogs
Exfiltrate data to add pressure (double extortion)

The key insight: if backups are writable with day-to-day admin credentials, they’re part of the same failure domain.

2) RPO and RTO: the two numbers that shape everything

Backups are about outcomes. RPO and RTO translate “security” into “operations”.

RPO (Recovery Point Objective)

How much data you can lose. If your RPO is 4 hours, you need restore points at least every 4 hours.

Databases: often minutes–hours
File shares: hours–days (depends)
SaaS: check provider limitations

RTO (Recovery Time Objective)

How long you can be down. If your RTO is 8 hours, you must be able to restore and validate within 8 hours.

Consider dependency chains (DNS, identity, network)
Test on realistic hardware and bandwidth
Document the slow steps (they define RTO)

3) 3-2-1-1-0: the backup principle that survives attackers

3-2-1 is a solid baseline. Ransomware pushes you to add “1-0”: one immutable/offline copy, and zero errors verified by restore tests.

What “immutable” means (and what it doesn’t)

Term	Practical meaning	Gotcha
Immutability (WORM)	Write once; cannot delete/overwrite until retention expires	Admins can still break it if configured wrong
Air gap	Not continuously reachable from production network	“Same cloud account” is not an air gap
Offline	Disconnected media or vault access only during backup window	Human process risk (someone forgets to disconnect)

4) Separation: identities, networks, and blast radius

Ransomware resilience is mostly about separation: making sure compromise of one area doesn’t grant control over everything.

Identity separation

Dedicated backup operator accounts
MFA everywhere (including backup console)
Break-glass credentials stored offline
Service accounts with minimal scope

Network & storage separation

Backup network/VLAN isolated from user endpoints
Backup repositories not domain-joined (when possible)
Separate accounts/tenants for offsite storage
Immutable bucket/vault policies enforced server-side

Snapshots are not backups (by default)

Storage snapshots are great for fast rollback, but many live in the same admin plane and can be deleted by a credentialed attacker. Treat snapshots as a speed layer, not the final safety net.

Step-by-step

This section is a practical build plan. You can apply it to a home lab, a startup stack, or a mid-size org. The names of tools vary, but the architecture and habits stay the same.

Step 1 — Inventory what you must restore (and in what order)

In a real incident, you don’t restore “everything at once”. You restore capabilities: identity, core services, apps, then endpoints. Start by listing:

Tier 0: identity, DNS, certificates, secrets, core networking
Tier 1: databases, storage, message queues, key internal apps
Tier 2: file shares, collaboration tools, secondary services
Tier 3: endpoints, dev boxes, “nice-to-have” systems

This tiering prevents the classic failure: restoring an application before the identity or database it depends on.

Step 2 — Set RPO/RTO per tier and choose a backup method that can meet it

Different systems need different strategies. A database usually wants point-in-time recovery; a static file share may not. Use this mapping as a starting point:

System type	Typical approach	Why it works
Databases	Full + incremental + log shipping / PITR	Fine-grained restore points, consistent recovery
VMs / servers	Image-based backups + config export	Fast “whole system” restore, easy rebuild
File shares	Versioned file backups + immutable copy	Recovers before encryption; handles deletes
SaaS (email, docs)	Provider exports or third-party backup	Protects against account takeover and retention limits
Kubernetes / IaC	Git as source of truth + periodic cluster state backup	Rebuild infra quickly; restore only stateful pieces

Step 3 — Build a two-layer recovery architecture (fast layer + survivable layer)

A useful mental model is two layers:

Layer A — Fast recovery (operational convenience)

Snapshots and local backup repositories that get you back quickly for normal incidents.

Frequent backups for tight RPO
Short retention for fast restores
Local bandwidth = fast recovery

Layer B — Survivable recovery (ransomware safety net)

An offsite, immutable/offline copy that remains intact even during credential compromise.

Immutable retention enforced server-side
Separate identity / tenant / account if possible
Restore procedures tested and documented

Why two layers?

Ransomware-ready backups can be slower or more expensive. You still want a fast layer for day-to-day restores — just don’t confuse “fast” with “safe”.

Step 4 — Harden the backup plane (make deletion hard)

Most backup failures during ransomware aren’t “the backup job failed.” They’re: someone logged into the backup console and deleted the history. Harden the backup plane like it’s production.

Identity hardening checklist

Separate backup admin accounts (not daily admin)
Strong MFA (phishing-resistant where possible)
Limit backup console access (jump host / VPN)
Service accounts can write backups but cannot delete
Alert on privilege changes and failed MFA

Storage hardening checklist

Enable immutability/WORM with retention policy
Separate encryption keys from production admins
Disable “delete all” paths (policy + technical controls)
Protect the backup catalog/metadata
Monitor for mass deletions and unusual access

Step 5 — Automate backups + retention + basic verification

Automation is how you keep backups boring (boring is good). Below is a practical example using a versioned backup tool and an object store as a repository. Adapt the concepts to your platform: schedule, encrypt, retain, and verify.

#!/usr/bin/env bash
set -euo pipefail

# Example: a simple encrypted, versioned backup routine (adapt to your tooling)
# - Initializes repository (once)
# - Runs backup
# - Applies retention ("forget/prune")
# - Performs a lightweight integrity check
#
# Tip: store credentials in a secret manager, not in this file.

export RESTIC_REPOSITORY="s3:https://s3.example.com/my-backups"
export RESTIC_PASSWORD_FILE="/etc/backup/restic.pass"
export AWS_ACCESS_KEY_ID="$(cat /etc/backup/s3_access_key)"
export AWS_SECRET_ACCESS_KEY="$(cat /etc/backup/s3_secret_key)"

HOST_TAG="$(hostname -s)"
DATE="$(date +%F)"

# 1) Initialize repo once (idempotent-ish check)
if ! restic snapshots >/dev/null 2>&1; then
  restic init
fi

# 2) Backup critical paths (tune per host)
restic backup \
  /etc \
  /var/lib \
  /srv \
  --tag "${HOST_TAG}" \
  --tag "daily" \
  --exclude-file="/etc/backup/excludes.txt"

# 3) Retention policy (example):
#    - keep 7 daily, 4 weekly, 12 monthly snapshots per host
restic forget --tag "${HOST_TAG}" \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 12 \
  --prune

# 4) Lightweight verification (spot-check metadata + a small sample)
#    Full verify is heavier; schedule it weekly/monthly.
restic check --read-data-subset=1/50

echo "[OK] Backup finished for ${HOST_TAG} on ${DATE}"

Don’t confuse “encrypted backups” with “ransomware-proof backups”

Encryption protects confidentiality. It does not stop a credentialed attacker from deleting your repository. Immutability and separation are the protections against deletion.

Step 6 — Keep backup policy in a config file (so it’s reviewable)

Teams often “configure backups” in a UI and never write down what they did. Treat backup policy like code: it should be readable, reviewable, and versioned.

# Example: a readable backup policy file you can version-control (conceptual)
# Use this as a template even if your tool uses a different format.

backup_policy:
  name: "core-servers"
  schedule:
    daily: "02:15"
    weekly: "Sun 03:10"
  retention:
    daily: 7
    weekly: 4
    monthly: 12
    yearly: 3
  data_scope:
    include:
      - "/etc"
      - "/srv"
      - "/var/lib"
    exclude:
      - "/var/lib/docker"
      - "/var/tmp"
      - "**/*.iso"
  security:
    encryption: true
    repository:
      type: "object-store"
      offsite: true
      immutable: true          # enforced by storage policy, not just client settings
    identities:
      backup_operator_mfa: true
      writer_cannot_delete: true
  verification:
    smoke_restore:
      frequency: "weekly"
      target: "isolated-restore-vm"
      checks:
        - "checksum_sample"
        - "app_health_check"
    full_integrity_check:
      frequency: "monthly"

Step 7 — Test restores like a fire drill (and measure your real RTO)

Restore testing has two layers: technical and operational. Technical tests prove bits can be restored. Operational tests prove people can do it under time pressure.

A weekly smoke-restore (30–60 minutes)

Restore a small sample (a directory, a DB dump)
Validate with checksums and/or application query
Record duration + any manual steps
Update the runbook immediately

A quarterly full scenario (2–6 hours)

Assume compromised admin creds
Restore tier order (identity → DB → app)
Practice “clean room” rebuild of one system
Verify monitoring/logging after recovery

You can automate parts of this. The script below demonstrates a simple concept: periodically restore a random sample and verify integrity with hashes. Even if you don’t use this exact script, the pattern is the win.

import hashlib
import os
import random
import subprocess
from pathlib import Path

"""
Conceptual restore test:
- Restore a snapshot (or file sample) into an isolated directory
- Compute hashes for a sample of files
- Compare with expected hashes if you have them (or store as "golden" over time)

Adapt to your backup tool by replacing the restore command.
Run in an isolated environment, not on production hosts.
"""

RESTORE_DIR = Path("/tmp/restore_test")
SAMPLE_FILES = 25
MAX_FILE_BYTES = 10 * 1024 * 1024  # skip very large files in smoke tests

def sha256_file(path: Path) -> str:
    h = hashlib.sha256()
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

def restore_snapshot(snapshot_id: str) -> None:
    # Example placeholder restore command. Replace for your environment/tool.
    # e.g., restic restore SNAPSHOT --target /tmp/restore_test
    subprocess.check_call(["restic", "restore", snapshot_id, "--target", str(RESTORE_DIR)])

def list_restored_files(root: Path) -> list[Path]:
    files: list[Path] = []
    for p in root.rglob("*"):
        if p.is_file():
            try:
                if p.stat().st_size <= MAX_FILE_BYTES:
                    files.append(p)
            except FileNotFoundError:
                # In case files are moved during listing; ignore
                continue
    return files

def main() -> None:
    RESTORE_DIR.mkdir(parents=True, exist_ok=True)

    # Pick a snapshot id (you can also select "latest" with your tool)
    snapshot_id = os.environ.get("SNAPSHOT_ID")
    if not snapshot_id:
        raise SystemExit("Set SNAPSHOT_ID env var to a snapshot you want to smoke-restore.")

    # Restore into isolated directory
    restore_snapshot(snapshot_id)

    # Sample files and hash them
    files = list_restored_files(RESTORE_DIR)
    if not files:
        raise SystemExit("No files restored; check restore command and snapshot contents.")

    sample = random.sample(files, k=min(SAMPLE_FILES, len(files)))
    print(f"Restored files found: {len(files)} | Hashing sample: {len(sample)}")

    for p in sample:
        digest = sha256_file(p)
        print(f"{digest}  {p}")

    print("OK: smoke-restore completed (hash sample printed).")

if __name__ == "__main__":
    main()

What to write down (your restore runbook)

Where immutable/offsite backups live and who can access them
Exact restore order (Tier 0 → Tier 3)
How to rebuild identity/secrets safely (clean room assumptions)
Validation steps (hash checks, app tests, user sign-off)
Decision points (when to isolate, when to reimage, when to rotate keys)

Common mistakes

These are the patterns behind “we had backups but still paid” or “recovery took weeks.” Each mistake includes a practical fix you can apply without rebuilding everything.

Mistake 1 — Backups share the same admin plane as production

If production admins can delete backup history, attackers can too (once they steal credentials).

Fix: separate backup identities, restrict console access, use MFA.
Fix: enforce immutability/WORM server-side (not just a client checkbox).

Mistake 2 — “Snapshots = backups” with no offsite/immutable layer

Snapshots are fast, but often deletable. Ransomware makes “deletable” a deal-breaker.

Fix: keep snapshots for speed, add an immutable offsite copy for safety.
Fix: test that old restore points remain accessible after an “admin compromise” scenario.

Mistake 3 — No restore testing (only “job success” monitoring)

Backup logs say the job ran. They don’t say you can restore, decrypt, boot, and validate.

Fix: weekly smoke-restore + quarterly scenario test.
Fix: validate with app-level checks, not only file presence.

Mistake 4 — Retention is too short for “slow ransomware”

Some incidents are discovered late. If you only keep 7 days, you may only have encrypted versions left.

Fix: keep multiple horizons: daily/weekly/monthly.
Fix: protect longer retention in the immutable/offsite layer.

Mistake 5 — Keys, secrets, and identity aren’t backed up safely

You can restore servers and still be stuck if you can’t restore certificates, secrets, or identity.

Fix: treat secrets and identity as Tier 0; back them up with strict access controls.
Fix: keep offline break-glass recovery materials (documented, audited).

Mistake 6 — Backup tooling runs on untrusted endpoints

If the backup agent or credentials live on compromised machines, attackers can use them.

Fix: isolate backup infrastructure and limit outbound credentials on endpoints.
Fix: monitor for unusual backup activity (mass deletes, unusual restore attempts).

A practical prioritization rule

First, make sure you have an immutable/offline copy. Second, test one restore end-to-end. Those two actions beat almost any “more backups” project.

FAQ

Do backups stop ransomware?

No—backups don’t prevent infection. Backups prevent ransomware from becoming a business-ending event by letting you restore without paying. Pair backups with prevention (patching, MFA, EDR) and detection (alerts, logging) for real resilience.

What’s the difference between “offsite” and “air-gapped” backups?

Offsite means geographically/separately hosted; air-gapped means not continuously reachable. Many “offsite” backups are still reachable via the same cloud account and credentials. Air gap is about access path and blast radius, not distance.

Are cloud backups automatically safe from ransomware?

Not automatically. Cloud storage can be deleted or overwritten if an attacker has the right permissions. The safety comes from immutability (WORM/object lock), least-privilege identities, and separating backup access from daily admin access.

How often should we test restores?

Weekly for a small smoke test and quarterly for a full scenario is a practical baseline. If your systems are high-change or high-stakes, increase frequency. The goal is to keep restore steps fresh and catch silent failures early.

How long should we keep backup retention to handle late discovery?

Use multiple horizons: daily for quick rollback, weekly/monthly for late discovery, and a longer immutable archive for worst-case scenarios. Retention should reflect how quickly you can detect compromise and how long you need for compliance.

What should we restore first after a ransomware event?

Restore Tier 0 first: identity, DNS, secrets, certificates, and core networking—then databases and critical apps. Restoring apps before identity and data usually creates a mess (and can reintroduce compromise).

FAQ takeaway

The “backup question” is really: Can we restore clean systems and trusted data quickly enough to survive? Everything else is details.

Cheatsheet

A scan-fast checklist for ransomware-ready backups. Print it, paste it into a ticket, or turn it into your internal standard.

Backups that actually save you

RPO/RTO defined per system (not “weekly for everything”)
3-2-1-1-0 implemented (immutable/offline + verified)
Offsite copy in a separate blast radius
Backup repository deletion is blocked (server-side policy)
Backup identities separated and MFA-protected
Restore runbooks written and tested

Restore drill: minimum viable plan

Weekly: smoke-restore a random sample
Monthly: full integrity check (heavier verification)
Quarterly: scenario restore (assume compromised admin creds)
Measure actual time, not estimated time (real RTO)
Validate with app checks + checksums
Update docs immediately after each test

“Before you call it done” checklist

Question	“Good” looks like
Can an attacker with admin creds delete our last 30 days of backups?	No (immutability/WORM + separation enforced)
Do we have at least one offsite copy?	Yes (separate account/tenant if possible)
Have we restored a representative system end-to-end?	Yes (documented steps + validation)
Do we know restore order and dependencies?	Yes (Tier 0 → Tier 3 runbook)
Do we have a break-glass path if identity is down?	Yes (offline, audited, tested)

The fastest way to fail

Having “lots of backups” but no immutable layer and no restore tests is how organizations end up paying. Fix survivability and testing first; then optimize speed.

Wrap-up

Ransomware-ready backups aren’t about buying a bigger storage box. They’re about designing for an attacker who can log in. If you remember one thing: survivable backups require immutability, separation, and restore testing.

Your next actions (in order)

Pick targets: write RPO/RTO for your top 5 systems.
Make one copy undeletable: implement an immutable/offline backup layer.
Split identities: separate backup access from daily admin access.
Prove recovery: run one end-to-end restore and document the runbook.
Repeat: schedule weekly smoke-restores and quarterly scenario tests.

If you want to level up beyond backups, the next steps are about reducing initial compromise and limiting blast radius: threat modeling your environment, hardening authentication, and building a DevSecOps pipeline that prevents risky changes. UniLab has related guides to help:

A good end state

Backups become “boring”: automated, monitored, immutable, and routinely restored. When ransomware hits, you execute a practiced playbook instead of improvising.

UniLab Editorial

Modern learning notes for practical builders.

Ransomware Reality Check: Backups That Actually Save You

Quickstart

1) Set your recovery targets (RPO/RTO)

2) Enforce 3-2-1-1-0 (the survivable baseline)

3) Split identities: backups must not share admin keys

4) Run one real restore test this week

Overview

What this post covers

Core concepts

1) The ransomware backup threat model (why “it’s on the network” matters)

2) RPO and RTO: the two numbers that shape everything

RPO (Recovery Point Objective)

RTO (Recovery Time Objective)

3) 3-2-1-1-0: the backup principle that survives attackers

What “immutable” means (and what it doesn’t)

4) Separation: identities, networks, and blast radius

Identity separation

Network & storage separation

Step-by-step

Step 1 — Inventory what you must restore (and in what order)

Step 2 — Set RPO/RTO per tier and choose a backup method that can meet it

Step 3 — Build a two-layer recovery architecture (fast layer + survivable layer)

Layer A — Fast recovery (operational convenience)

Layer B — Survivable recovery (ransomware safety net)

Step 4 — Harden the backup plane (make deletion hard)

Identity hardening checklist

Storage hardening checklist

Step 5 — Automate backups + retention + basic verification

Step 6 — Keep backup policy in a config file (so it’s reviewable)

Step 7 — Test restores like a fire drill (and measure your real RTO)

A weekly smoke-restore (30–60 minutes)

A quarterly full scenario (2–6 hours)

Common mistakes

Mistake 1 — Backups share the same admin plane as production

Mistake 2 — “Snapshots = backups” with no offsite/immutable layer

Mistake 3 — No restore testing (only “job success” monitoring)

Mistake 4 — Retention is too short for “slow ransomware”

Mistake 5 — Keys, secrets, and identity aren’t backed up safely

Mistake 6 — Backup tooling runs on untrusted endpoints

FAQ

Do backups stop ransomware?

What’s the difference between “offsite” and “air-gapped” backups?

Are cloud backups automatically safe from ransomware?

How often should we test restores?

How long should we keep backup retention to handle late discovery?

What should we restore first after a ransomware event?

Cheatsheet

Backups that actually save you

Restore drill: minimum viable plan

“Before you call it done” checklist

Wrap-up

Your next actions (in order)

Quiz

Related posts