OpenTofu state in Cloudflare R2 without giving Cloudflare plaintext

OpenTofu state in Cloudflare R2 without giving Cloudflare plaintext
Photo by marcos mayer / Unsplash

I moved my prod Terragrunt state out of an SSHFS mount and into Cloudflare R2 because the old setup failed in exactly the way state storage should not fail.

The old setup had two problems:

  1. If the prod NAS/server was unhealthy, it could also take the Terraform state with it.
  2. The state lived behind an SSHFS mount, which meant it was easy to forget the mount and accidentally plan or apply against the wrong thing.

I wanted the operational convenience of remote object storage, but I did not want Cloudflare holding plaintext state because Terraform state contains passwords, tokens, and other high-value data.

The final design is:

  • Terragrunt for orchestration.
  • OpenTofu instead of Terraform.
  • Cloudflare R2 as the remote backend.
  • OpenTofu state encryption so the client encrypts state before it ever hits R2.

The important detail

Cloudflare never sees plaintext state.

That requirement is why this is OpenTofu and not plain Terraform.

Backend encryption like SSE is not enough. That protects data at rest on the provider side, but the provider can still decrypt it. I wanted client-side encryption where the backend only stores ciphertext.

OpenTofu gives you that with TF_ENCRYPTION.

The backend config

Prod is now R2-only.

locals {
  state_key     = "${path_relative_to_include()}/terraform.tfstate"
  r2_account_id = get_env("CLOUDFLARE_R2_ACCOUNT_ID", "")
  r2_bucket     = "aoostar-home-server-tf-state"
  r2_endpoint   = "https://${local.r2_account_id}.r2.cloudflarestorage.com"
}

terraform_binary = "tofu"

generate "backend" {
  path      = "backend.tf"
  if_exists = "overwrite"

  contents = <<-EOF
terraform {
  backend "s3" {
    bucket                      = "${local.r2_bucket}"
    key                         = "${local.state_key}"
    region                      = "auto"
    use_lockfile                = true
    encrypt                     = false
    use_path_style              = true
    skip_s3_checksum            = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
    skip_requesting_account_id  = true
    skip_region_validation      = true

    endpoints = {
      s3 = "${local.r2_endpoint}"
    }
  }
}
EOF
}

Two details matter here:

  • encrypt = false is intentional. I do not want to confuse backend-managed encryption with the actual security boundary. The real protection is client-side encryption.
  • use_lockfile = true gives me state locking without adding DynamoDB or another coordination system.

This does not stop you from also enabling provider-side encryption at rest if your object store supports it. That can still be useful as defense in depth for the provider's disks. It just is not the reason the design is safe, because the provider can still decrypt that layer.

The runtime wrapper

I keep the passphrase outside the repo in a 0600 file and let a thin wrapper inject TF_ENCRYPTION.

#!/usr/bin/env bash
set -euo pipefail

PASSPHRASE_FILE=${HOME_SERVER_PROD_STATE_PASSPHRASE_FILE:-$HOME/.config/home-server/opentofu-prod-state-passphrase}

export TG_TF_PATH=tofu
export TF_INPUT=0
export HOME_SERVER_PROD_STATE_PASSPHRASE=$(tr -d '\n' < "$PASSPHRASE_FILE")

export TF_ENCRYPTION='key_provider "pbkdf2" "home_server_prod" {
  passphrase               = "..."
  encrypted_metadata_alias = "home-server-prod-state"
}

method "aes_gcm" "home_server_prod" {
  keys = key_provider.pbkdf2.home_server_prod
}

state {
  method   = method.aes_gcm.home_server_prod
  enforced = true
}

plan {
  method   = method.aes_gcm.home_server_prod
  enforced = true
}'

exec terragrunt "$@"

The wrapper does two things:

  • it forces Terragrunt to use tofu
  • it fails closed if the passphrase file is missing or empty

That second point matters. I do not want a missing key to silently look like an empty backend.

Secret handling

I keep three pieces of backend configuration outside the repo:

export CLOUDFLARE_R2_ACCOUNT_ID='...'
export AWS_ACCESS_KEY_ID='...'
export AWS_SECRET_ACCESS_KEY='...'

And the OpenTofu passphrase lives in:

~/.config/home-server/opentofu-prod-state-passphrase

The passphrase is also backed up in my password manager.

That gives me a reasonable failure model:

  • local machine can run plans/applies normally
  • Cloudflare holds only encrypted state
  • if the laptop dies, I can recover with the passphrase backup and R2 credentials

Day-to-day usage after the migration

The operational change is small.

Before, prod was basically:

export SOPS_AGE_KEY_FILE=/home/jarrod/terraform_state/home-server/sops.key
terragrunt apply

Now prod is:

export SOPS_AGE_KEY_FILE=/home/jarrod/terraform_state/home-server/sops.key

export REPO_DIR="$HOME/code/projects/home-server"
cd "$REPO_DIR/environments/aoostar/k3s"
"$REPO_DIR/environments/aoostar/terragrunt-prod.sh" init -reconfigure
"$REPO_DIR/environments/aoostar/terragrunt-prod.sh" apply

That wrapper is the only real command change. It auto-loads the R2 env file if present, ensures the OpenTofu encryption config is present every time, and on a machine that still remembers the old backend you run init -reconfigure once before normal use.

One practical nuance: Terragrunt dependency caches can also remember the old backend. If a machine still throws Backend initialization required errors mentioning local after the migration, clear the old .terragrunt-cache and .terraform directories once and rerun init -reconfigure.

Dev is simpler because it stays on a local backend:

export SOPS_AGE_KEY_FILE=/home/jarrod/terraform_state/home-server/sops.key
cd "$HOME/code/projects/home-server/environments/dev/k3s"
terragrunt apply

The migration trap

The obvious migration path looked like tofu init -migrate-state.

I tested that first.

That is not the path I ended up trusting for this backend combination.

For my setup, the safe and verifiable flow was:

  1. read the old local state with tofu state pull
  2. write the new remote state with tofu state push while TF_ENCRYPTION is enforced
  3. fetch the object from R2 and confirm it contains encrypted_data
  4. only then run plan

The rough flow looked like this:

source ~/.config/home-server/aoostar-r2.env

# read old plaintext state locally
tofu -chdir=/tmp/local-backend state pull > /tmp/source-state.json

# push encrypted state to R2
TF_ENCRYPTION=... tofu -chdir=/tmp/r2-backend state push /tmp/source-state.json

# verify what landed in R2 is encrypted
aws s3api get-object \
  --bucket aoostar-home-server-tf-state \
  --key virtual-machine/cloudflare/terraform.tfstate \
  /tmp/remote-state.json \
  --endpoint-url "https://${CLOUDFLARE_R2_ACCOUNT_ID}.r2.cloudflarestorage.com"

Then inspect the JSON wrapper:

{
  "serial": 1,
  "lineage": "...",
  "meta": { "key_provider.pbkdf2.home_server_prod": "..." },
  "encryption_version": 1,
  "encrypted_data": "..."
}

That is the property I cared about. If R2 holds encrypted_data, Cloudflare is storing ciphertext, not state internals.

Validation after migration

I migrated each stack individually and then ran:

export REPO_DIR="$HOME/code/projects/home-server"
"$REPO_DIR/environments/aoostar/terragrunt-prod.sh" init -reconfigure
"$REPO_DIR/environments/aoostar/terragrunt-prod.sh" plan -detailed-exitcode

That mattered for two reasons:

  1. I wanted Terragrunt/OpenTofu to rebind cleanly to the new backend.
  2. I wanted to separate migration correctness from existing infrastructure drift.

Some stacks came back clean. Some came back with existing drift like local file regeneration, output-only changes, or provider-default differences. That was useful because it told me the migration worked and any remaining plan deltas were actual stack-specific cleanup items.

Cloudflare-specific notes

R2 was straightforward once the credentials were right, but there were a couple of practical details:

  • use the S3 endpoint Cloudflare gives you: https://<account-id>.r2.cloudflarestorage.com
  • bucket names are normal S3-style names; you do not pre-create folder structures
  • Object Read & Write is enough, but client IP filtering can break backend calls in ways that look like random 403 errors

The bucket stays private.

I did not enable:

  • custom domains
  • public development URL
  • bucket lock rules
  • CORS

Dev vs prod

Only prod moved to R2.

dev still uses a local backend because it is machine-local QEMU state and different developers may have their own local copies.

But dev still moved to OpenTofu in the operational sense. It now runs through tofu via Terragrunt as well.

That does not require a backend migration. Local unencrypted state is still readable by OpenTofu. The practical impact is mostly:

  • use tofu instead of terraform
  • refresh lockfiles over time as stacks get reinitialized

Why this is better

The new setup fixes both of the original problems.

First, prod state no longer disappears with the server I am trying to repair.

Second, I no longer depend on remembering an SSHFS mount before running infrastructure commands.

The whole point of remote state is reducing operational footguns. If the state path depends on a manually-mounted filesystem, that is not really remote state. That is a trap.

This version is simpler to operate and much closer to the security model I actually wanted.

If you want to copy this pattern

The shortest checklist is:

  1. switch Terragrunt to tofu
  2. use an R2 S3-compatible backend
  3. inject TF_ENCRYPTION from a local passphrase file
  4. migrate with local state pull and encrypted remote state push
  5. verify the object in R2 contains encrypted_data
  6. run init -reconfigure and plan per stack

Do not assume init -migrate-state gives you the exact security property you want just because encryption is configured somewhere in the toolchain.

Verify the remote object.