K8s Grafana Doesn't Persist Data

K8s Grafana Doesn't Persist Data

I recently setup monitoring on my home MicroK8s cluster (Prometheus, Grafana, InfluxDB) and found out the hard way that by default the Grafana deployment does not have any kind of Persistence (emptyDir volume). If the pod gets restarted all of your custom settings, sources, dashboards, etc will be erased.

In order to fix this I'm using kubectl patch to replace the grafana-storage emptyDir volume.

I use ansible to manage my deployments so I need these 2 tasks

- name: Check if Grafana has had Persistent Volume added
  shell: "{{ microk8s_path}}.kubectl -n monitoring describe deployments grafana | grep persistentVolumeClaim || true"
  register: grafana_needs_persistence


- name: Patch Grafana to use Persistent Volume
  shell: "{{ microk8s_path}}.kubectl -n monitoring patch deployment grafana --patch \"{{ lookup('file', 'deployment-patch.yml') }}\""
  when: grafana_needs_persistence.stdout_lines | length == 0

This is  the patch file I use:

spec:
  template:
    spec:
      volumes:
      - name: grafana-storage
        emptyDir: null
        persistentVolumeClaim:
          claimName: grafana-volume-claim
      containers:
      - name: grafana
        image: 'grafana/grafana:7.4.1'
        env:
          - name: GF_PATHS_DATA
            value: /opt/grafana/data
          - name: GF_PATHS_PLUGINS
            value: /opt/grafana/plugins
        volumeMounts:
          - name: grafana-storage
            mountPath: /opt/grafana

Key points are:

  • Patch is a great way to make small tweaks to deployment files that you didn't create. Rather than copy the entire existing Grafana deployment into my repo I can just make small modifications. In this case overriding the volume and setting environment variables.
  • emptyDir: null You need to force this to be removed otherwise kubernetes will complain that there is both an emptyDir and persistentVolumeClaim on the volume. By default this looked like emptyDir: {}
  • I override GF_PATHS_DATA and GF_PATHS_PLUGINS to move data away from /var/lib/grafana because I was having issue with permissions. This may not be required for your setup

Permissions

For the other deployment the volume setup was simple. For Grafana I get the error

grafana_1        | GF_PATHS_DATA='/var/lib/grafana' is not writable.
grafana_1        | You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-late

When the Grafana Pod starts it creates/mounts the volume as root and then fails to write to it. Grafana has updated the way they handle permissions within their containers to improve security. However this seems to have caused a lot of confusion with volume mounting.

There are various solutions to this online, but the one I decided to use is to create the directory myself and set the ownership to grafana uid. These are my ansible tasks for hostPath volume:

- name: Check if Grafana backup dir exists
  stat:
    path: /var/data/grafana
  register: grafana_data_dir

- name: Create directory on hostpath for Grafana
  become: true
  file:
    path: /var/data/grafana
    state: directory
    mode: 0755
    owner: "472"
    group: "1"
  when: grafana_data_dir.stat.exists == false

472 and 1 were taken from Grafanas migration document:

Run Grafana Docker image
Guide for running Grafana using Docker

Grafana now starts and any new dashboards I create survive a pod restart.