Kubernetes Rolling Volume Backups

Problem

This article shows how you can creating rolling backups of you Kubernetes Volumes using rsync hard links.

This is a solution that I use on my MicroK8s home server.

Volumes are mounted to hostpath.
I have 2 physical SSDs attached - one primary, one backup.

Before this I was still making backups with rsync, but they were stored as independent artifacts every day. If I wanted 30 days of daily backups then my 38GB NextCloud instance would need 1.14TB of space. I didn't do this, instead I would run rsync daily and once a week I would create a compressed archive of whatever the daily backup was - but we can do better. The technique I use now only needs 41GB to store 30 days (38GB once + 172MB for each of the others) - a huge improvement.

Why 172MB? I'm not sure exactly. This was the largest size I saw on my testing. I did not explicitly change anything in Nextcloud, but it's likely logs or other system files had changed and needed to be copied again.

Why are backups important?

Protect against failure of the physical disk.
Protect against unintended deletion, modification, or corruption.
Allow recovery in case a change or update breaks my cluster.

Using rsync hard links allowed me to achieve two important things

Retain files for a set period of time
Avoid unnecessary duplicates of files

Symbolic Link vs Hard Links

In Linux you can create both symlinks, and hardlinks. Symlinks seem to get talked about more than hardlinks.

A symlink allows you to point to another file.
A hardlink allows you to point at the underlying inode

The kernel tracks hard links and only frees memory when there are no more pointing to the node. Hard links are managed, Symlinks can dangle.

This means that if I create an rsync backup today, another one tomorrow and a file has not change - instead of duplicating the data, we can just point tomorrows hard-link to todays inode. I only need to store the pointer, not the full data.

If I want to have a 7 day retention then I just need to delete the '7 day old' backup. If a file was deleted 7 days ago, then this will be the last reference to the inode and the data will be removed. Otherwise there will still be other backups pointing to this inode keeping it alive.

If a file has changed then a new inode will be created and you will be able to see the history as it was captured in those snapshots.

Solution

The full code can be found below, but the main part is this

# Create a timestamp for this backup
# I include hours/minutes/seconds in case I want to run 
# more freqeuntly than daily. Helpful for testing
DATE=$(date +%Y-%m-%d-%H-%M-%S)

# Get the most recent snapshot by name e.g. "2024-04-16-23-15-08"
MOST_RECENT_TIMESTAMP=$(ls -1 | sort -r | head -n1)

# Construct the full path to the latest snapshot
MOST_RECENT="/backup/{{ dst_backup_path }}/$MOST_RECENT_TIMESTAMP"

# Use rsync to backup from the source location to the backup location
# If files are removed in the source --delete them from the target
# Use the files in --link-dest=$MOST_RECENT as reference.
# If references are found, hard link to the inode instead of recreating
rsync -a --delete --link-dest=$MOST_RECENT \
                /source/{{ src_backup_path }} \
                $BACKUP_DIR/$DATE

# Prune backups that are older than our retention period.
find $BACKUP_DIR/ -maxdepth 0 -type d -mtime +{{ retention_days }} -exec rm -rf {} \;

This is the full template that I use in Ansible

apiVersion: batch/v1
kind: CronJob
metadata:
  name: "{{ cron_name }}"
spec:
  schedule: "{{ trigger_time }}"
  jobTemplate:
    spec:
      template:
        spec:
          volumes:
          - name: "{{ src_volume }}"
            persistentVolumeClaim:
              claimName: "{{ src_volume_claim }}"
          - name: "{{ dst_volume }}"
            persistentVolumeClaim:
              claimName: "{{ dst_volume_claim }}"
          containers:
          - name: "{{ cron_name }}"
            image: alpinelinux/rsyncd:latest
            imagePullPolicy: Always
            volumeMounts:
            - name: "{{ src_volume }}"
              mountPath: /source
            - name: "{{ dst_volume }}"
              mountPath: /backup
            command:
            - /bin/sh
            - -c
            - |
              set -x
              DATE=$(date +%Y-%m-%d-%H-%M-%S)
              echo "Starting Rolling Backup for $DATE"
              BACKUP_DIR="/backup/{{ dst_backup_path }}"

              # create a placeholder in case this is the first backup
              PLACEHOLDER_DIR=$BACKUP_DIR/0000-00-00-00-00-00

              [ ! -d $PLACEHOLDER_DIR ] && mkdir -p $PLACEHOLDER_DIR
              cd /backup/{{ dst_backup_path }}

              # Get the most recent backup
              MOST_RECENT_TIMESTAMP=$(ls -1 | sort -r | head -n1)
              MOST_RECENT="/backup/{{ dst_backup_path }}/$MOST_RECENT_TIMESTAMP"
              echo "Previous Rolling Backup: $MOST_RECENT"

              # create a new incremental directory and uses hardlinks for files that haven't changed
              rsync -a --delete --link-dest=$MOST_RECENT \
                /source/{{ src_backup_path }} \
                $BACKUP_DIR/$DATE

              # remove old expired backups
              # -maxdepth to only check modify time against the top level that we created during backup
              find $BACKUP_DIR/ -maxdepth 0 -type d -mtime +{{ retention_days }} -exec rm -rf {} \;
          restartPolicy: OnFailure

Volumes will need to be setup prior to use.
Use src_backup_path to backup specific directories in a volume.
Configure when the task is run by setting trigger_time to a cron expression.
Set retention_days to the number of days you want to keep backups for.
The max number of backups will be (retention_days * runs per day) + 1

- name: Create NextCloud Rolling Daily Backup
  vars:
      - cron_name: "nextcloud-backup-rolling-daily"
      - trigger_time: "0 13 * * *"
      - src_volume: nextcloud-backup-persistent-volume
      - src_volume_claim:  nextcloud-backup-volume-claim
      - src_backup_path: data
      - dst_volume: backups-persistent-volume
      - dst_volume_claim:  backups-volume-claim
      - dst_backup_path: nextcloud/nextcloud-rolling
      - retention_days: 30
  k8s:
    state: "{{nextcloud_state}}"
    namespace: "{{ backups_namespace }}"
    definition: "{{ lookup('template', '../common/templates/backup/rolling-rsync.yml.j2') }}"

References

What is the difference between a symbolic link and a hard link?

Recently I was asked this during a job interview. I was honest and said I knew how a symbolic link behaves and how to create one, but do not understand the use of a hard link and how it differs fro…

Stack OverflowNick Stinemates

Incremental backups with rsync and hard links

Advice on how to build a simple incremental backup solution using rsync and hard links.

digitalis.ioRichard Gooding

Incremental vs. Differential vs. Full Backup - A Comparison Guide

Both differential and incremental backups are “smart” backups that save time and disk space by only backing up changed files. But they differ significantly in how they do it, and how useful the result is.

AcronisAcronis