equivariant

Onsite backups with zrepl

For some time I've been using a homegrown solution for onsite backups. It works fine, but required me to manually prune old snapshots periodically. Rather than extend my script to support pruning, I decided to use a prebuilt solution.

zrepl is a tool that handles periodic snapshotting, sending snapshots to other zpools (possibly remote), and pruning old snapshots. It does exactly what I needed, and more.

Installation

For Arch Linux users, a package for zrepl is available in the AUR. For other distributions, see the documentation.

cd aur
auracle download -r zrepl
cd zrepl
less PKGBUILD  # Always inspect AUR packages!
makepkg -sic

Configuration

On my system, I have my root filesystem in the dataset zroot/ROOT/default, a HDD-backed zpool named slow, and a dataset containing lectures and other large media files at slow/bulk. I configured zrepl to back up the root dataset to slow/zrepl and take a daily snapshot of slow/bulk.

# /etc/zrepl/zrepl.yml

global:
  logging:
    - type: syslog
      format: human
      level: warn

jobs:
  # Periodically send a snapshot of the root filesystem to slow/zrepl.
  - name: zroot_to_slow
    type: push
    connect:
      type: local
      # Must match listener_name from sink below.
      listener_name: slow_zrepl_sink
      # A name for the system being backed up. I called it `local` because
      # there's only one system involved. The hostname would also be fine.
      client_identity: local
    filesystems:
      "zroot/ROOT/default": true
    snapshotting:
      type: periodic
      prefix: zrepl_
      interval: 1h
    pruning:
      keep_sender:
        - type: not_replicated
        - type: regex
          # Keep snapshots not created by zrepl
          negate: true
          regex: "^zrepl_.*"
      keep_receiver:
        - type: grid
          # Hourly for a day
          # Daily for 6 months
          # Weekly for 2 years
          # Monthly for 10 years
          grid: 1x1d(keep=all) | 180x1d | 104x1w | 120x30d
          regex: "^zrepl_.*"

  # This is the dataset where the snapshots are sent to.
  - name: slow_zrepl_sink
    type: sink
    serve:
      type: local
      # Must match listener_name from push above.
      listener_name: slow_zrepl_sink
    root_fs: slow/zrepl
    recv:
      placeholder:
        encryption: inherit

  # Take a daily snapshot of slow/bulk.
  - name: snapshot_bulk
    type: snap
    filesystems:
      "slow/bulk": true
    snapshotting:
      type: periodic
      prefix: zrepl_
      interval: 24h
    pruning:
      keep:
        - type: grid
          # Daily for 3 months
          # Weekly for 6 months
          # Monthly for 1 year
          grid: 90x1d | 26x1w | 12x30d
          regex: "^zrepl_.*"
        - type: regex
          # Keep snapshots not created by zrepl
          negate: true
          regex: "^zrepl_.*"

Starting the service

First, create the destination dataset:

sudo zfs create -o readonly=on -o mountpoint=none slow/zrepl

Start zrepl:

sudo systemctl enable --now zrepl.service

Wait until the first snapshot is sent:

sudo zrepl status

Give the destination dataset a mountpoint so you can easily access its snapshots:

sudo zfs set mountpoint=/slow/zrepl slow/zrepl/local/zroot/ROOT/default

You should now be able to access the snapshots at /slow/zrepl/.zfs/snapshot/zrepl_YYYYMMDD_HHMMSS_000. Note that the .zfs directory is hidden by default, so you must navigate to it directly.

Monitoring

Set a calendar reminder to manually test backups monthly. You should verify that the backups are current and you are actually able to restore from them.

Add a cron job that checks for errors from zrepl in the last day:

sudo crontab -e
#min hour dom mon dow cmd
0    0    *   *   *   journalctl -S -1d -p err -t zrepl -q

If any errors occurred in the last day, they will be mailed to root.

This has the potential to miss errors if, for example, the computer is powered off when the cron job is scheduled to run. That could be fixed, but I find this to be sufficient for my needs.