Kubernetes Rolling Volume Backups
Problem
This article shows how you can creating rolling backups of you Kubernetes Volumes using rsync hard links.
This is a solution that I use on my MicroK8s home server.
- Volumes are mounted to
hostpath
. - I have 2 physical SSDs attached - one primary, one backup.
Before this I was still making backups with rsync, but they were stored as independent artifacts every day. If I wanted 30 days of daily backups then my 38GB NextCloud instance would need 1.14TB of space. I didn't do this, instead I would run rsync daily and once a week I would create a compressed archive of whatever the daily backup was - but we can do better. The technique I use now only needs 41GB to store 30 days (38GB once + 172MB for each of the others) - a huge improvement.
Why 172MB? I'm not sure exactly. This was the largest size I saw on my testing. I did not explicitly change anything in Nextcloud, but it's likely logs or other system files had changed and needed to be copied again.
Why are backups important?
- Protect against failure of the physical disk.
- Protect against unintended deletion, modification, or corruption.
- Allow recovery in case a change or update breaks my cluster.
Using rsync hard links allowed me to achieve two important things
- Retain files for a set period of time
- Avoid unnecessary duplicates of files
Symbolic Link vs Hard Links
In Linux you can create both symlinks, and hardlinks. Symlinks seem to get talked about more than hardlinks.
- A symlink allows you to point to another file.
- A hardlink allows you to point at the underlying inode
The kernel tracks hard links and only frees memory when there are no more pointing to the node. Hard links are managed, Symlinks can dangle.
This means that if I create an rsync backup today, another one tomorrow and a file has not change - instead of duplicating the data, we can just point tomorrows hard-link to todays inode. I only need to store the pointer, not the full data.
If I want to have a 7 day retention then I just need to delete the '7 day old' backup. If a file was deleted 7 days ago, then this will be the last reference to the inode and the data will be removed. Otherwise there will still be other backups pointing to this inode keeping it alive.
If a file has changed then a new inode will be created and you will be able to see the history as it was captured in those snapshots.
Solution
The full code can be found below, but the main part is this
# Create a timestamp for this backup
# I include hours/minutes/seconds in case I want to run
# more freqeuntly than daily. Helpful for testing
DATE=$(date +%Y-%m-%d-%H-%M-%S)
# Get the most recent snapshot by name e.g. "2024-04-16-23-15-08"
MOST_RECENT_TIMESTAMP=$(ls -1 | sort -r | head -n1)
# Construct the full path to the latest snapshot
MOST_RECENT="/backup/{{ dst_backup_path }}/$MOST_RECENT_TIMESTAMP"
# Use rsync to backup from the source location to the backup location
# If files are removed in the source --delete them from the target
# Use the files in --link-dest=$MOST_RECENT as reference.
# If references are found, hard link to the inode instead of recreating
rsync -a --delete --link-dest=$MOST_RECENT \
/source/{{ src_backup_path }} \
$BACKUP_DIR/$DATE
# Prune backups that are older than our retention period.
find $BACKUP_DIR/ -maxdepth 0 -type d -mtime +{{ retention_days }} -exec rm -rf {} \;
This is the full template that I use in Ansible
apiVersion: batch/v1
kind: CronJob
metadata:
name: "{{ cron_name }}"
spec:
schedule: "{{ trigger_time }}"
jobTemplate:
spec:
template:
spec:
volumes:
- name: "{{ src_volume }}"
persistentVolumeClaim:
claimName: "{{ src_volume_claim }}"
- name: "{{ dst_volume }}"
persistentVolumeClaim:
claimName: "{{ dst_volume_claim }}"
containers:
- name: "{{ cron_name }}"
image: alpinelinux/rsyncd:latest
imagePullPolicy: Always
volumeMounts:
- name: "{{ src_volume }}"
mountPath: /source
- name: "{{ dst_volume }}"
mountPath: /backup
command:
- /bin/sh
- -c
- |
set -x
DATE=$(date +%Y-%m-%d-%H-%M-%S)
echo "Starting Rolling Backup for $DATE"
BACKUP_DIR="/backup/{{ dst_backup_path }}"
# create a placeholder in case this is the first backup
PLACEHOLDER_DIR=$BACKUP_DIR/0000-00-00-00-00-00
[ ! -d $PLACEHOLDER_DIR ] && mkdir -p $PLACEHOLDER_DIR
cd /backup/{{ dst_backup_path }}
# Get the most recent backup
MOST_RECENT_TIMESTAMP=$(ls -1 | sort -r | head -n1)
MOST_RECENT="/backup/{{ dst_backup_path }}/$MOST_RECENT_TIMESTAMP"
echo "Previous Rolling Backup: $MOST_RECENT"
# create a new incremental directory and uses hardlinks for files that haven't changed
rsync -a --delete --link-dest=$MOST_RECENT \
/source/{{ src_backup_path }} \
$BACKUP_DIR/$DATE
# remove old expired backups
# -maxdepth to only check modify time against the top level that we created during backup
find $BACKUP_DIR/ -maxdepth 0 -type d -mtime +{{ retention_days }} -exec rm -rf {} \;
restartPolicy: OnFailure
- Volumes will need to be setup prior to use.
- Use
src_backup_path
to backup specific directories in a volume. - Configure when the task is run by setting
trigger_time
to a cron expression. - Set
retention_days
to the number of days you want to keep backups for. - The max number of backups will be
(retention_days * runs per day) + 1
- name: Create NextCloud Rolling Daily Backup
vars:
- cron_name: "nextcloud-backup-rolling-daily"
- trigger_time: "0 13 * * *"
- src_volume: nextcloud-backup-persistent-volume
- src_volume_claim: nextcloud-backup-volume-claim
- src_backup_path: data
- dst_volume: backups-persistent-volume
- dst_volume_claim: backups-volume-claim
- dst_backup_path: nextcloud/nextcloud-rolling
- retention_days: 30
k8s:
state: "{{nextcloud_state}}"
namespace: "{{ backups_namespace }}"
definition: "{{ lookup('template', '../common/templates/backup/rolling-rsync.yml.j2') }}"